Data+

Data+ is a 10-week summer research experience that welcomes Duke undergraduates interested in exploring new data-driven approaches to interdisciplinary challenges. Students join small project teams, collaborating with other teams in a communal environment. They learn how to marshal, analyze, and visualize data, while gaining broad exposure to the modern world of data science.

 

 

Browse 2018 projects 

Become a part of Data+: Submit a project proposal

We're currently accepting proposals from faculty and our partners for 2019 Data+ projects. The deadline for completing this application is ​November 5th, 2018, at 5 p.m. Please email your completed application to Ariel Dawn (ariel.dawn@duke.edu). If you would like help in developing your proposal, please contact Paul Bendich at ​bendich@math.duke.edu​.

Submit a project proposal (PDF)

  • "The Data+ team created two new datasets that we'll immediately deploy as a part of our core research efforts and will serve as the basis for an upcoming Bass Connections in Energy project. The outputs will be used towards two new research projects on energy infrastructure and access in developing countries, and will serve as the ground truth data for developing machine learning techniques for identifying energy infrastructure and access. The students were fantastic - hardworking, passionate about their work, and all-around wonderful people to work with."

    —Kyle Bradbury, Lecturing Fellow and Managing Director, Duke Energy Data Analytics Lab

    Electricity Access in Developing Countries from Aerial Imagery

  • "Our DATA+ team did wonderful work. I’d wanted a software tool for quantifying human mobility data using network analysis for five years—the team got it done in eight weeks! Paul and Ashlee run a fantastic program that requires minimal oversight on the part of faculty but with maximum returns. Sign me up for next year!"

    — Joe McClernon, Duke Institute for Brain Sciences
    Project: Smoking and Activity Space

  • "The project mentor was fantastic. The three students I worked with were superb. We were able to make great progress that will lead to journal publications and grant proposals."

    —Wilkins Aquino, Professor, Duke Department of Electrical and Environmental Engineering

    Classification of Vascular Anomalies using Continuous Doppler Ultrasound and Machine Learning

  • "Participating in Data+ definitely changed my perception of Data Science research. It was more interdisciplinary than I expected, and the opportunity to work with experts across different fields (Medicine, Civil Engineering, Statistics) was a defining aspect of my Data+ experience."

    — Serge Assad, Biomedical Engineering, Electrical & Computer Engineering

    Classification of Vascular Anomalies using Continuous Doppler Ultrasound and Machine Learning

  • "Before Data+, data science research sounded like a non-collaborative job involving PhD-level statistical concepts. Data+, however, showed me that there is a place for collaborative workers from all different backgrounds (and of all skill levels) in data science research. Participating in Data+ has enriched my technical skills as a coder; I am now able to navigate software and employ coding languages that I was not at all familiar with before the start of the program. Even more valuable, however, are the "soft" skills I have gained -- specifically, the ability to approach collaboration with an open mind."

    —Susie Choi, Computer Science

    Visualizing Real Time Data from Mobile Health Technologies

  • "My participation in the Data+ program has shown me how to successfully work with a dynamic team. Each of my team members were fundamentally different in course interests and background, yet we came together to create a polished product in which we each were a point person for a specific portion. I have also gained confidence in my ability to learn new skills, as I basically taught myself (through Google and asking teammates) how to program in R over this summer."
    —Devri Adams, Environmental Science

    Data Viz for Long-term Ecological Research and Curricula

  • "I gained valuable program management experience. Given that after the program was over I got hired as a consultant manager at CollegeVine, I'd say it paid off."

    —Stefan Waldschmidt, English

    Quantified Feminism and the Bechdel Test

10
weeks during the summer
2-3
undergraduates per team
1-2
grad student mentors
25
projects sharing ideas and code

Related Videos

Projects

Brooke Erikson (Economics/Computer Science), Alejandro Ortega (Math), and Jade Wu (Computer Science) spent ten weeks developing open-source tools for automatic document categorization, PDF table extraction, and data identification. Their motivating application was provided by Power for All’s Platform for Energy Access Knowledge, and they frequently collaborated with professionals from that organization.

Click here to read the Executive Summary

 

Jake Epstein (Statistics/Economics), Emre Kiziltug (Economics), and Alexander Rubin (Math/Computer Science) spent ten weeks investigating the existence of relative value opportunities in global corporate bond markets. They worked closely with a dataset provided by a leading asset management firm.

Click here for the Executive Summary

Maksym Kosachevskyy (Economics) and Jaehyun Yoo (Statistics/Economics) spent ten weeks understanding temporal patterns in the used construction machinery market and investigating the relationship between these patterns and macroeconomic trends.

They worked closely with a large dataset provided by MachineryTrader.com, and discussed their findings with analytics professionals from a leading asset management firm.

Click here to read the Executive Summary

Alec Ashforth (Economics/Math), Brooke Keene (Electrical & Computer Engineering), Vincent Liu (Electrical & Computer Engineering), and Dezmanique Martin (Computer Science) spent ten weeks helping Duke’s Office of Information Technology explore the development of an “e-advisor” app that recommends co-curricular opportunities to students based on a variety of factors. The team used collaborative and content-based filtering to create a recommender-system prototype in R Shiny.

Click here to read the Executive Summary

Statistical Science majors Eidan Jacob and Justina Zou joined forces with math major Mason Simon built interactive tools that analyze and visualize the trajectories taken by wireless devices as they move across Duke’s campus and connect to its wireless network. They used de-identified data provided by Duke’s Office of Information Technology, and worked closely with professionals from that office.

Click here for the Executive Summary

 

Cecily Chase (Applied Math), Brian Nieves (Computer Science), and Harry Xie (Computer Science/Statistics) spent ten weeks understanding how algorithmic approaches can shed light on which data center tasks (“stragglers”) are typically slowed down by unbalanced or limited resources. Working with a real dataset provided by project clients Lenovo, the team created a monitoring framework that flags stragglers in real time.

Click here to read the Executive Summary

David Liu (Electrical Computer Engineering) and Connie Wu (Computer Science/Statistics) spent ten weeks analyzing data about walking speed from the 6th Vital Sign Study.

Integrating study data with public data from the American Community Survey, they built interactive visualization tools that will help researchers understand the study results and the representativeness of study participants.

Click here to read the Executive Summary

Lucas Fagan (Computer Science/Public Policy), Caroline Wang (Computer Science/Math), and Ethan Holland (Statistics/Computer Science) spent ten weeks understanding how data science can contribute to fact-checking methodology. Training on audio data from major news stations, they adapted OpenAI methods to develop a pipeline that moves from audio data to an interface that enables users to search for claims related to other claims that had been previously investigated by fact-checking websites.

This project will continue into the academic year via Bass Connections.

Click here to read the Executive Summary.

A team of students led by Professors Jonathan Mattingly and Gregory Herschlag will investigate gerrymandering in political districting plans.  Students will improve on and employ an algorithm to sample the space of compliant redistricting plans for both state and federal districts.  The output of the algorithm will be used to detect gerrymandering for a given district plan; this data will be used to analyze and study the efficacy of the idea of partisan symmetry.  This work will continue the Quantifying Gerrymandering project, seeking to understand the space of redistricting plans and to find justiciable methods to detect gerrymandering. The ideal team has a mixture of members with programing backgrounds (C, Java, Python), statistical experience including possibly R, mathematical and algorithmic experience, and exposure to political science or other social science fields.

Read the latest updates about this ongoing project by visiting Dr. Mattingly's Gerrymandering blog.

Varun Nair (Mechanical Engineering), Tamasha Pathirathna (Computer Science), Xiaolan You (Computer Science/Statistics), and Qiwei Han (Chemistry) spent ten weeks creating a ground-truthed dataset of electricity infrastructure that can be used to automatically map the transmission and distribution components of the electric power grid. This is the first publicly available dataset of its kind, and will be analyzed during the academic year as part of a Bass Connections team.

Click here to read the Executive Summary

Kimberly Calero (Public Policy/Biology/Chemistry), Alexandra Diaz (Biology/Linguistics), and Cary Shindell (Environmental Engineering) spent ten weeks analyzing and visualizing data about disparities in Social Determinants of Health. Working with data provided by the MURDOCK Study, the American Community Survey, and the Google Places API, the team built a dataset and visualization tool that will assist the MURDOCK research team in exploring health outcomes in Cabarrus County, NC.

Click here to read the Executive Summary

Alexandra Putka (Biology/Neuroscience), John Madden (Economics), and Lucy St. Charles (Global Health/Spanish) spent ten weeks understanding the coverage and timeliness of maternal and pediatric vaccines in Durham. They used data from DEDUCE, the American Community Survey, and the CDC.

This project will continue into the academic year via Bass Connections.

Click here to read the Executive Summary

Dima Fayyad (Electrical & Computer Engineering), Sean Holt (Math), David Rein (Computer Science/Math) spent ten weeks exploring tools that will operationalize the application of distributed computing methodologies in the analysis of electronic medical records (EMR) at Duke.

As a case study, they applied these systems to an Natural Language Processing project on clinical narratives about growth failure in premature babies.

Click here to read the Executive Summary

Zhong Huang (Sociology) and Nishant Iyengar (Biomedical Engineering) spent ten weeks investigating the clinical profiles of rare metabolic diseases. Working with a large dataset provided by the Duke University Health System, the team used natural language processing techniques and produced an R Shiny visualization that enables clinicians to interactively explore diagnosis clusters.

Click here to read the Executive Summary

Samantha Garland (Computer Science), Grant Kim (Computer Science, Electrical & Computer Engineering), and Preethi Seshadri (Data Science) spent ten weeks exploring factors that influence patient choices when faced with intermediate-stage prostate cancer diagnoses. They used topic modeling in an analysis of a large collection of clinical appointment transcripts.

Click here for the Executive Summary

Nathan Liang (Psychology, Statistics), Sandra Luksic (Philosophy, Political Science),and Alexis Malone (Statistics) spent ten weeks using tools from text and image analytics to understand the evolving representations of women on magazine covers. They worked with a large collection of magazine covers from Duke’s library archive.

Click here to read the Executive Summary

Jennie Wang (Economics/Computer Science) and Blen Biru (Biology/French) spent ten weeks building visualizations of various aspects of the lives of orphaned and separated children at six separate sites in Africa and Asia. The team created R Shiny interactive visualizations of data provided by the Positive Outcomes for Orphans study (POFO).

Click here to read the Executive Summary

Aaron Crouse (Divinity), Mariah Jones (Sociology), Peyton Schafer (Statistics), and Nicholas Simmons (English/Education) spent ten weeks consulting with leadership from the Parents Teacher Association at Glenn Elementary School in Durham. The team set up infrastructure for data collection and visualization that will aid the PTA in forming future strategy.

Click here to read the Executive Summary

 

 

Gabriel Guedes (Math, Global Cultural Studies), Lucian Li (Computer Science, History), and Orgil Batzaya (Math, Computer Science) spent ten weeks using text analytics and interactive mapping tools to understand the geographic spread of 1,482 versions and editions of the Robinson Crusoe story.  They worked with data provided by the Hathi Trust, the University of Florida, and the Internet Archive.

Click here for the Executive Summary

Melanie Lai Wai (Statistics) and Saumya Sao (Global Health, Gender Studies) spent ten weeks developing a platform which enables users to understand factors that influence contraceptive use and discontinuation. Their work combined data from the Demographic and Health Surveys contraceptive calendar with open data about reproductive health and social indicators from the World Bank, World Health Organization, and World Population Prospects. This project will continue into the academic year via Bass Connections.

Click here to read the Executive Summary

Bob Ziyang Ding (Math/Stats) and Daniel Chaofan Tao (ECE) spent ten weeks understanding how deep learning techniques can shed light on single cell analysis. Working with a large set of single-cell sequencing data, the team built an autoencoder pipeline and a device that will allow biologists to interactively visualize their own data.

Click here to read the Executive Summary

Ashley Murray (Chemistry/Math), Brian Glucksman (Global Cultural Studies), and Michelle Gao (Statistics/Economics) spent ten weeks using sentiment and image analysis to understand semantic shifts in the way American presidents have used the term “poverty,” stretching from the 1930s to the present day.  

In addition to many YouTube videos of presidential speeches, they made use of a large archive of presidential addresses provided by the American Presidency Project.

Click Here for the Executive Summary

Natalie Bui (Math/Economics), David Cheng (Electrical & Computer Engineering), and Cathy Lee (Statistics) spent ten weeks helping the Prospect Management and Analytics office of Duke Development understand how a variety of analytic techniques might enhance their workflow. The team used topic modeling and named entity recognition to develop a pipeline that clusters potential prospects into useful categories.

Click here to read the Executive Summary

Tatanya Bidopia (Psychology, Global Health), Matthew Rose (Computer Science), Joyce Yoo (Public Policy/Psychology) spent ten weeks doing a data-driven investigation of the relationship between mental health training of law enforcement officers and key outcomes such as incarceration, recidivism, and referrals for treatment. They worked closely with the Crisis Intervention Team, and they used jail data provided by the Sheriff’s Office of Durham County.

Click here to read the Executive Summary

 

Past Projects

Sophie Guo, Math/PoliSci major, Bridget Dou, ECE/CompSci major, Sachet Bangia, Econ/CompSci major, and Christy Vaughn spent ten weeks studying different procedures for drawing congressional boundaries, and quantifying the effects of these procedures on the fairness of actual election results.

Anna Vivian (Physics, Art History) and Vinai Oddiraju (Stats) spent ten weeks working closely with the director of the Durham Neighborhood Compass. Their goal was to produce metrics for things like ambient stress and neighborhood change, to visualize these metrics within the Compass system, and to interface with a variety of community stakeholders in their work.

Maddie Katz (Global Health and Evolutionary Anthropology Major), Parker Foe (Math/Spanish, Smith College), and Tony Li (Math, Cornell) spent ten weeks analyzing data from the National Transgender Discrimination Survey. Their goal was to understand how the discrimination faced by the trans community is realized on a state, regional, and national level, and to partner with advocacy organizations around their analysis.