Data+

Data+ is a 10-week summer research experience that welcomes Duke undergraduates interested in exploring new data-driven approaches to interdisciplinary challenges. Students join small project teams, collaborating with other teams in a communal environment. They learn how to marshal, analyze, and visualize data, while gaining broad exposure to the modern world of data science.

 

 

Browse 2018 projects 

Become a part of Data+: Submit a project proposal

We're currently accepting proposals from faculty and our partners for 2019 Data+ projects. The deadline for completing this application is ​November 5th, 2018, at 5 p.m. Please email your completed application to Ariel Dawn (ariel.dawn@duke.edu). If you would like help in developing your proposal, please contact Paul Bendich at ​bendich@math.duke.edu​.

Submit a project proposal (PDF)

To submit a joint Data+/Bass Connections proposal, please review the Bass Connections RFP.

  • "We were extremely impressed with the "I got this" attitude of our Data+ students! I am already recommending the program to my colleagues."

    — Amy Finnegan, Research Scholar, SSRI, DGHI Evidence Lab

    Big Data for Reproductive Health

  • "We greatly enjoyed working with our Data+ team, and were very pleased with the end results of their work. Highly recommend this program for anyone who is looking to dip their toes into applying data analytics in their world."

    —Richard Biever, Senior Director, Office of Information Technology

    Duke Wireless Data
    Co-Curricular Pathways E-Advisor

  • "I was extremely impressed with our project mentor/manager. I was initially skeptical that a graduate student would be able to run the team and project as well as she did. She not only handled the day to day work but also came up with a lot of the ideas implemented in the project. We're actually going to be hiring some of the students to continue this project through the academic year."

    —Evan Levine, Director, Academic & Media Technologies

    Co-Curricular Pathways E-Advisor

  • "Undergraduate students are very capable and driven. They quickly reproduced prior results on existing datasets and extended methods to accommodate new datasets."

    —Benjamin Lee, Associate Professor of Electrical and Computer Engineering, Duke University

    Data-Driven Improvement of Data Center Performance

  • "The team of students had excellent technical and communications skills. I was impressed at how self-driven they were, and their ability to come up with solutions to some inevitable data issues that they encountered. Their final output was both better presented, and more directly applicable, than I expected at the outset."

    —Emma Rasiel, Professor Economics, Duke University

    Data and the Global Corporate Bond Market

  • "I am particularly impressed by how students with different (particularly non-computing) backgrounds were able to contribute to various projects in very meaningful ways. They were able to pick up bleeding-edge research and technology useful to the project. The intensive summer experience was really, really useful. At the end of the project, they definitely knew more than I did for some areas of the project!"

    —Jun Yang, Professor of Computer Science, Duke University

    Data & Technology for Fact-Checking

  • "The experience this year was great. We had a dedicated group of students who thought creatively about the problem we brought to them."

    —Natalie Spring, Director of Prospect Research, Management and Analytics, Duke University Development

    Analytical Exploration for Duke Development

  • "Evaluation and Research is a Sustaining Core Element of Crisis Intervention Team. We are very fortunate to be in a position to collaborate with Duke’s Data+ Program. The team has done a phenomenal job with evaluation data from 2002 to 2017. We are looking forward to this continued relationship."

    —Elijah Bazemore, Durham County Sheriff’s Department

    Mental Health Interventions with the Durham Police

  • "Our DATA+ team did wonderful work. I’d wanted a software tool for quantifying human mobility data using network analysis for five years—the team got it done in eight weeks! Paul and Ashlee run a fantastic program that requires minimal oversight on the part of faculty but with maximum returns. Sign me up for next year!"

    — Joe McClernon, Duke Institute for Brain Sciences
    Project: Smoking and Activity Space

  • "Participating in Data+ definitely changed my perception of Data Science research. It was more interdisciplinary than I expected, and the opportunity to work with experts across different fields (Medicine, Civil Engineering, Statistics) was a defining aspect of my Data+ experience."

    — Serge Assad, Biomedical Engineering, Electrical & Computer Engineering

    Classification of Vascular Anomalies using Continuous Doppler Ultrasound and Machine Learning

  • "Before Data+, data science research sounded like a non-collaborative job involving PhD-level statistical concepts. Data+, however, showed me that there is a place for collaborative workers from all different backgrounds (and of all skill levels) in data science research. Participating in Data+ has enriched my technical skills as a coder; I am now able to navigate software and employ coding languages that I was not at all familiar with before the start of the program. Even more valuable, however, are the "soft" skills I have gained -- specifically, the ability to approach collaboration with an open mind."

    —Susie Choi, Computer Science

    Visualizing Real Time Data from Mobile Health Technologies

  • "My participation in the Data+ program has shown me how to successfully work with a dynamic team. Each of my team members were fundamentally different in course interests and background, yet we came together to create a polished product in which we each were a point person for a specific portion. I have also gained confidence in my ability to learn new skills, as I basically taught myself (through Google and asking teammates) how to program in R over this summer."
    —Devri Adams, Environmental Science

    Data Viz for Long-term Ecological Research and Curricula

  • "I gained valuable program management experience. Given that after the program was over I got hired as a consultant manager at CollegeVine, I'd say it paid off."

    —Stefan Waldschmidt, English

    Quantified Feminism and the Bechdel Test

10
weeks during the summer
2-3
undergraduates per team
1-2
grad student mentors
25
projects sharing ideas and code

Related Videos

Projects

Brooke Erikson (Economics/Computer Science), Alejandro Ortega (Math), and Jade Wu (Computer Science) spent ten weeks developing open-source tools for automatic document categorization, PDF table extraction, and data identification. Their motivating application was provided by Power for All’s Platform for Energy Access Knowledge, and they frequently collaborated with professionals from that organization.

Click here to read the Executive Summary

 

Jake Epstein (Statistics/Economics), Emre Kiziltug (Economics), and Alexander Rubin (Math/Computer Science) spent ten weeks investigating the existence of relative value opportunities in global corporate bond markets. They worked closely with a dataset provided by a leading asset management firm.

Click here for the Executive Summary

Maksym Kosachevskyy (Economics) and Jaehyun Yoo (Statistics/Economics) spent ten weeks understanding temporal patterns in the used construction machinery market and investigating the relationship between these patterns and macroeconomic trends.

They worked closely with a large dataset provided by MachineryTrader.com, and discussed their findings with analytics professionals from a leading asset management firm.

Click here to read the Executive Summary

Alec Ashforth (Economics/Math), Brooke Keene (Electrical & Computer Engineering), Vincent Liu (Electrical & Computer Engineering), and Dezmanique Martin (Computer Science) spent ten weeks helping Duke’s Office of Information Technology explore the development of an “e-advisor” app that recommends co-curricular opportunities to students based on a variety of factors. The team used collaborative and content-based filtering to create a recommender-system prototype in R Shiny.

Click here to read the Executive Summary

Statistical Science majors Eidan Jacob and Justina Zou joined forces with math major Mason Simon built interactive tools that analyze and visualize the trajectories taken by wireless devices as they move across Duke’s campus and connect to its wireless network. They used de-identified data provided by Duke’s Office of Information Technology, and worked closely with professionals from that office.

Click here for the Executive Summary

Cecily Chase (Applied Math), Brian Nieves (Computer Science), and Harry Xie (Computer Science/Statistics) spent ten weeks understanding how algorithmic approaches can shed light on which data center tasks (“stragglers”) are typically slowed down by unbalanced or limited resources. Working with a real dataset provided by project clients Lenovo, the team created a monitoring framework that flags stragglers in real time.

Click here to read the Executive Summary

David Liu (Electrical Computer Engineering) and Connie Wu (Computer Science/Statistics) spent ten weeks analyzing data about walking speed from the 6th Vital Sign Study.

Integrating study data with public data from the American Community Survey, they built interactive visualization tools that will help researchers understand the study results and the representativeness of study participants.

Click here to read the Executive Summary

Lucas Fagan (Computer Science/Public Policy), Caroline Wang (Computer Science/Math), and Ethan Holland (Statistics/Computer Science) spent ten weeks understanding how data science can contribute to fact-checking methodology. Training on audio data from major news stations, they adapted OpenAI methods to develop a pipeline that moves from audio data to an interface that enables users to search for claims related to other claims that had been previously investigated by fact-checking websites.

This project will continue into the academic year via Bass Connections.

Click here to read the Executive Summary.

A team of students led by Professors Jonathan Mattingly and Gregory Herschlag will investigate gerrymandering in political districting plans.  Students will improve on and employ an algorithm to sample the space of compliant redistricting plans for both state and federal districts.  The output of the algorithm will be used to detect gerrymandering for a given district plan; this data will be used to analyze and study the efficacy of the idea of partisan symmetry.  This work will continue the Quantifying Gerrymandering project, seeking to understand the space of redistricting plans and to find justiciable methods to detect gerrymandering. The ideal team has a mixture of members with programing backgrounds (C, Java, Python), statistical experience including possibly R, mathematical and algorithmic experience, and exposure to political science or other social science fields.

Read the latest updates about this ongoing project by visiting Dr. Mattingly's Gerrymandering blog.

Varun Nair (Mechanical Engineering), Tamasha Pathirathna (Computer Science), Xiaolan You (Computer Science/Statistics), and Qiwei Han (Chemistry) spent ten weeks creating a ground-truthed dataset of electricity infrastructure that can be used to automatically map the transmission and distribution components of the electric power grid. This is the first publicly available dataset of its kind, and will be analyzed during the academic year as part of a Bass Connections team.

Click here to read the Executive Summary

Kimberly Calero (Public Policy/Biology/Chemistry), Alexandra Diaz (Biology/Linguistics), and Cary Shindell (Environmental Engineering) spent ten weeks analyzing and visualizing data about disparities in Social Determinants of Health. Working with data provided by the MURDOCK Study, the American Community Survey, and the Google Places API, the team built a dataset and visualization tool that will assist the MURDOCK research team in exploring health outcomes in Cabarrus County, NC.

Click here to read the Executive Summary

Alexandra Putka (Biology/Neuroscience), John Madden (Economics), and Lucy St. Charles (Global Health/Spanish) spent ten weeks understanding the coverage and timeliness of maternal and pediatric vaccines in Durham. They used data from DEDUCE, the American Community Survey, and the CDC.

This project will continue into the academic year via Bass Connections.

Click here to read the Executive Summary

Dima Fayyad (Electrical & Computer Engineering), Sean Holt (Math), David Rein (Computer Science/Math) spent ten weeks exploring tools that will operationalize the application of distributed computing methodologies in the analysis of electronic medical records (EMR) at Duke.

As a case study, they applied these systems to an Natural Language Processing project on clinical narratives about growth failure in premature babies.

Click here to read the Executive Summary

Zhong Huang (Sociology) and Nishant Iyengar (Biomedical Engineering) spent ten weeks investigating the clinical profiles of rare metabolic diseases. Working with a large dataset provided by the Duke University Health System, the team used natural language processing techniques and produced an R Shiny visualization that enables clinicians to interactively explore diagnosis clusters.

Click here to read the Executive Summary

Samantha Garland (Computer Science), Grant Kim (Computer Science, Electrical & Computer Engineering), and Preethi Seshadri (Data Science) spent ten weeks exploring factors that influence patient choices when faced with intermediate-stage prostate cancer diagnoses. They used topic modeling in an analysis of a large collection of clinical appointment transcripts.

Click here for the Executive Summary

Nathan Liang (Psychology, Statistics), Sandra Luksic (Philosophy, Political Science),and Alexis Malone (Statistics) began their 10-week project as an open-ended exploration how women are depicted both physically and figuratively in women's magazines, seeking to consider what role magazines play in the imagined and real lives of women.

Click here to read the Executive Summary

Jennie Wang (Economics/Computer Science) and Blen Biru (Biology/French) spent ten weeks building visualizations of various aspects of the lives of orphaned and separated children at six separate sites in Africa and Asia. The team created R Shiny interactive visualizations of data provided by the Positive Outcomes for Orphans study (POFO).

Click here to read the Executive Summary

Aaron Crouse (Divinity), Mariah Jones (Sociology), Peyton Schafer (Statistics), and Nicholas Simmons (English/Education) spent ten weeks consulting with leadership from the Parents Teacher Association at Glenn Elementary School in Durham. The team set up infrastructure for data collection and visualization that will aid the PTA in forming future strategy.

Click here to read the Executive Summary

In tracing the publication history, geographical spread, and content of “pirated” copies of Daniel Defoe’s Robinson Crusoe, Gabriel Guedes (Math, Global Cultural Studies), Lucian Li (Computer Science, History), and Orgil Batzaya (Math, Computer Science) explored the complications of looking at a data set that saw drastic changes over the last three centuries in terms of spelling and grammar, which offered new challenges to data cleanup. By asking questions of the effectiveness of “distant reading” techniques for comparing thousands of different editions of Robinson Crusoe, the students learned how to think about the appropriateness of myriad computational methods like doc2vec and topic modeling. Through these methods, the students started to ask, at what point does one start seeing patterns that were invisible at a human scale of reading (reading one book at a time)? While the project did not definitively answer these questions, it did provide paths for further inquiry.

The team published their results at: https://orgilbatzaya.github.io/pirating-texts-site/

Click here for the Executive Summary

Melanie Lai Wai (Statistics) and Saumya Sao (Global Health, Gender Studies) spent ten weeks developing a platform which enables users to understand factors that influence contraceptive use and discontinuation. Their work combined data from the Demographic and Health Surveys contraceptive calendar with open data about reproductive health and social indicators from the World Bank, World Health Organization, and World Population Prospects. This project will continue into the academic year via Bass Connections.

Click here to read the Executive Summary

Bob Ziyang Ding (Math/Stats) and Daniel Chaofan Tao (ECE) spent ten weeks understanding how deep learning techniques can shed light on single cell analysis. Working with a large set of single-cell sequencing data, the team built an autoencoder pipeline and a device that will allow biologists to interactively visualize their own data.

Click here to read the Executive Summary

Ashley Murray (Chemistry/Math), Brian Glucksman (Global Cultural Studies), and Michelle Gao (Statistics/Economics) spent 10 weeks analyzing how meaning and use of the work “poverty” changed in presidential documents from the 1930s to the present. The students found that American presidential rhetoric about poverty has shifted in measurable ways over time. Presidential rhetoric, however, doesn’t necessarily affect policy change. As Michelle Gao explained, “The statistical methods we used provided another more quantitative way of analyzing the text. The database had around 130,000 documents, which is pretty impossible to read one by one and get all the poverty related documents by brute force. As a result, web-scraping and word filtering provided a more efficient and systematic way of extracting all the valuable information while minimizing human errors.” Through techniques such as linear regression, machine learning, and image analysis, the team effectively analyzed large swaths of textual and visual data. This approach allowed them to zero in on significant documents for closer and more in-depth analysis, paying particular attention to documents by presidents such as Franklin Delano Roosevelt or Lyndon B. Johnson, both leaders in what LBJ famously called “The War on Poverty.”

Click Here for the Executive Summary

Natalie Bui (Math/Economics), David Cheng (Electrical & Computer Engineering), and Cathy Lee (Statistics) spent ten weeks helping the Prospect Management and Analytics office of Duke Development understand how a variety of analytic techniques might enhance their workflow. The team used topic modeling and named entity recognition to develop a pipeline that clusters potential prospects into useful categories.

Click here to read the Executive Summary

Tatanya Bidopia (Psychology, Global Health), Matthew Rose (Computer Science), Joyce Yoo (Public Policy/Psychology) spent ten weeks doing a data-driven investigation of the relationship between mental health training of law enforcement officers and key outcomes such as incarceration, recidivism, and referrals for treatment. They worked closely with the Crisis Intervention Team, and they used jail data provided by the Sheriff’s Office of Durham County.

Click here to read the Executive Summary

Past Projects

Sophie Guo, Math/PoliSci major, Bridget Dou, ECE/CompSci major, Sachet Bangia, Econ/CompSci major, and Christy Vaughn spent ten weeks studying different procedures for drawing congressional boundaries, and quantifying the effects of these procedures on the fairness of actual election results.

Anna Vivian (Physics, Art History) and Vinai Oddiraju (Stats) spent ten weeks working closely with the director of the Durham Neighborhood Compass. Their goal was to produce metrics for things like ambient stress and neighborhood change, to visualize these metrics within the Compass system, and to interface with a variety of community stakeholders in their work.

ECE majors Mitchell Parekh and Yehan (Morton) Mo, along with IIT student Nikhil Tank, spent ten weeks understanding parking behavior at Duke. They worked closely with the Parking and Transportation Office, as well as with Vice President for Administration Kyle Cavanaugh.