Data+ Program Continues to Grow

May 25, 2016

The Information Initiative’s Data+ Program offers exciting opportunities for undergraduates and graduates to use real data to answer real-world questions, and will start its second official summer program on May 23, 2016.

“Data+ is a fantastic way of organizing mentored interdisciplinary research that engages with real-world problems,” said Ed Balleisen, Vice Provost for Interdisciplinary Studies. He also noted the value of having diverse teams work in close proximity to one another over the program’s 10 weeks. “Not only do team members collaborate on their particular data challenges, but all the teams learn from each other, sometimes through sharing work in progress, sometimes by sharing specific expertise about coding strategies, statistical analysis or other aspects of data analytics.”

During its first year in 2015, the Data+ Project teams were an enormous success, and enrollment has increased almost 50 percent over last year. Sixty-seven students have enrolled in 2016’s Data+ Program. Student project teams work in the same building so that teams can share information and collaborate throughout the program.

The seeds for Data+ were planted in 2011, with a research training grant from the National Science Foundation on Structure and Complex Data that was shared by the Mathematics and Statistics Department, with John Harer as the Principal Investigator. The goal of this grant was to bring the theoretical methodology being developed in Mathematics, Statistics and Machine Learning to young people studying in quantitative fields. The program ran as such for the next three summers, with stipends for 10 undergrads to work on small data projects. In 2015, the expanded program moved to the Information Initiative at Duke, and was officially named Data+.

This summer, there will be 25 project teams working on a broad spectrum of interdisciplinary data projects. Project teams will work with data from the Transgender Discrimination Survey Project, the Durham Neighborhoods Project, the National Asset Scorecard for Communities of Color, and many others).

Data+ Coordinators Paul Bendich and Ashlee Valente work with each project team mentor to ensure that students understand what each client is requesting from the data they provide. Data+ projects must be well-constructed, well-thought out, and they must be able to reach specific goals at the end of 10 weeks.

While 10 of the $5,000 Data+ summer stipends are still covered by the NSF grant, Data+ funds its 55 other stipends and programmatic support with help from the Provost’s Office, the Information Initiative at Duke, the Social Science Research Initiative, and clients who would like to have their data be part of a student project.

Data+ clients can be Duke faculty from other disciplines and departments, or from organizations outside of Duke who wish to answer questions about their data, identify trends or clusters, or solve a problem. These clients work with a graduate student or post-doc project mentor on each team, who ensures that the project is addressing the clients’ questions and goals.

Prolific Pigs Team

Prolific Pigs participants:  Manchen (Mercy) Fang; Chris Glynn; Yanmin (Mike) Ma

Some examples of these project in action include the Prolific Pigs project from summer 2015. Professor Gabriel Rosenberg from the Women’s Studies Department has a collection of hog breeding journals from the 19th century, and wanted the students to play with the data to see what they could find. The project team had to learn Optical Character Recognition, since this digital data was PDF scans of historical documents, and the group ran statistics on the breeding data. Dr. Rosenberg was thrilled with the results and shared the success with his colleagues who have reached out to Data+ with additional interdisciplinary project proposals.

The Duke medical community has become a repeat Data+ client, after the success of a MEDx-sponsored Data+ project in 2015. Dr. Geoffrey Ginsburg, director of Duke MEDx, and colleagues Dr. Ephraim Tsalik and Dr. Emily Ko had previously collected and unanalyzed micro RNA data. The Micro RNA Host Response to Infection Project was created, and was mentored by Computational Biologist Ashlee Valente. This team studied next-generation miRNA sequence data from individuals affected with bacterial or viral infections, as well as healthy individuals. Their goal was to see if there were any differences in miRNA expression between patients with different infection types. Using Machine Learning, the project team identified a detectable difference in miRNA levels, capable of distinguishing infection classes. This project is currently being developed into a journal article authored by project team members Kelsey Sumner, Christopher Hong, and additional team members, and eventually may be developed into a diagnostic test for clinicians to distinguish viral and bacterial infections. As a result of this project’s success, MedX is now sponsoring nine more health-related projects for summer 2016, whose project mentors will all be coordinated by Dr. Valente.

Watch a video that highlights Dr. Valente’s work in the Micro RNA Host Response to Infection:

Data+ students working on medical projects will have exposure to real medical research questions and see their applications while also benefitting from access to the research, mentorship, and experience of the quantitative science community. “Data+ provides a rich learning experience for future data and clinician scientists, and is an ideal environment to foster collaborations between the School of Medicine and the University,” says Dr. Valente.

Another notable success of Data+ has been the increased engagement of female students in STEM fields. In the summer of 2015, 30 percent of the participants were female. This summer, the Data+ Program is 55 percent female. Adding female Statistics students to Data+ has also bolstered the interdisciplinary nature of the program by expanding Data+ participation from other fields at Duke like Anthropology, Global Health, Social Science, Public Policy, and many others where students may need strong coding and quantitative skills to study important problems.

Moving forward, Dr. Bendich and Dr. Valente hope the Data+ program will continue to grow by building valuable relationships with Data+ clients that lead to more data projects and student opportunities. “We will have at least 65 students this summer who are clearly very excited about doing Data Analytics. We hope this summer will show them what they do, and more can importantly transmit that excitement in a compelling way to a wide variety of communities, both inside the university and outside,” says Dr. Bendich.

By working with city agencies like the Durham Neighborhood Compass, bridging the School of Medicine and Duke Campus with health-related data solutions, and contributing to data-driven discussions of public policy (such as analyzing data from clients like the Samuel Dubois Cook Center for Social Equity’s National Assets Scorecard for Communities of Color), Data+ students are given a broad sense of how they can apply their data skills and make a large impact on the world outside of Duke.

If you are interested in a project proposal for 2017, please contact Paul Bendich (, or review the proposal guidelines for Data+ Projects to see how your project can become a part of the program.

Data+ is sponsored by Bass Connections, the Information Initiative at Duke, the Social Science Research Institute, the departments of Mathematics and Statistical Science, MEDx, and the Vice Provost for Research. Other Duke sponsors include Duke Health, Sanford School of Public Policy, Parking and Transportation, Development and Alumni Affairs, Duke Network Analysis Center, Samuel DuBois Cook Center on Social Equity, Professor Peter Lange, and the departments of Biology, Biostatistics and Bioinformatics, and Computer Science. Government funding comes from the National Science Foundation and the National Institutes of Health. Outside funding comes from Geometric Data Analytics, Inc., Sankofa, Inc., and RTI International. Community partnerships, data and interesting problems come from the Durham Neighborhood Compass, the North Carolina Justice Center, and the National Center for Gender Equality, the Smithsonian, Public Opinion Strategies,, the Triad Health Network, University of North Carolina–Greensboro, and the Dean of Academic Affairs.