Data+ Wraps Up Fourth Year of Data Projects

2018 Data+ Poster Session. August 3, 2018 Gross Hall
2018 Data+ Poster Session. August 3, 2018 Gross Hall

Data+ is a 10-week summer program at Duke that allows students to find new data-driven approaches to interdisciplinary challenges. On August 3, 2018, Data+ celebrated its fourth year with its annual poster session in Gross Hall’s Energy Hub Atrium. Over the summer, Data+ participants joined small project teams, and collaborated with other teams in a communal environment based in Gross Hall. 24 project teams worked with data challenges ranging from health data, text, voting records, wireless mapping data, economic and financial analytics, and many more. The Data+ program showcases many ways to apply data science in the real world, and provides a pivotal new way of exploring data for its participants. Here’s what some of this year’s Data+ students had to say about the program:

“I got hands on experience with buzzwords and ideas that I didn't understand before (and still am working on). I developed my intuition for when to use computers to advance humanities work and when not to. Learned how to overcome challenges in team-work and group-organizing. I also landed a research position with a philosophy professor doing more text mining and citation analysis, skills I would not have without Data+.”Sandra Luksic ’20 (Philosophy, Political Science), Women’s Spaces

“Data+ gave me an opportunity to apply some of the methods and skills that I've learned in the first year of my master's program to an interesting and very relevant project. It can be hard to see the 'real-world' value of concepts learned in school, even in an applied field like statistics; through Data+, I was able to run MCMC sampling, learn how to use QGIS to manipulate spatial data, and improve my knowledge of (mostly) Python and (a little) R. I learned not just how to answer questions, but also how to ask questions.” – Lisa Lebovici M (Statistics), Gerrymandering and the Extent of Democracy in America

“Data+ made me realize that data science research is surprisingly pragmatic. I came in thinking that success in data science came down to statistical brilliance, but in reality, the key to success is flexibility. 99 percent of the methods we tried did not work, and the only way we moved forward was changing our perspective of the problem.” – Grant Kim ’21 (Computer Science, Electrical and Computer Engineering), Complex Decisions, Real Numbers: Medical Decision Making

Data+ projects are sponsored by a faculty member or an industry representative with a data problem or question that the project team tackles over the summer. Often project teams are able to accomplish so much that they continue working with the students after the program is over on papers, prototype applications, and further analysis. Our Data+ clients reported that they are thrilled with this year’s results:

“I am particularly impressed by how students with different (particularly non-computing) backgrounds were able to contribute to various projects in very meaningful ways.” – Jun Yang, Professor of Computer Science, Duke University, Data & Technology for Fact-Checking project

“We greatly enjoyed working with our Data+ team, and were very pleased with the end results of their work. Highly recommend this program for anyone who is looking to dip their toes into applying data analytics in their world.” – Richard Biever, Senior Director, Office of Information Technology, Duke Wireless Data and Co-Curricular Pathways E-Advisor projects

“The team of students had excellent technical and communications skills. I was impressed at how self-driven they were, and their ability to come up with solutions to some inevitable data issues that they encountered. Their final output was both better presented, and more directly applicable, than I expected at the outset.” – Emma Rasiel, Professor Economics, Duke University, Data and the Global Corporate Bond Market project

Duke Forge, a transdisciplinary group including clinical leadership, quantitative expertise, operational management, and collaborator advisory, sponsored the Improving the Machine Learning Pipeline at Duke team. The team developed a tool to operationalize the application of distributed computing methodologies in the analysis of electronic medical records (EMR) at Duke. As a case study, they applied these systems to a Natural Language Processing project on clinical narratives about growth failure in premature babies. The team presented their work at the Forge’s Neonatal Intensive Care Unit (NICU) prediction stakeholder collective meeting on July 31, 2018. The presentation was very well received and prompted a lively discussion about how this data could be used to predict growth failure and other health markers in babies. Data+ looks forward to working with Duke Forge more in the future!

Sean Holt of the Improving the Machine Learning Pipeline at Duke team explains his team's work at the 2018 Data+ poster session in Gross Hall on August 3, 2018
Sean Holt of the Improving the Machine Learning Pipeline at Duke team explains his team's work at the 2018 Data+ poster session in Gross Hall on August 3, 2018.

At the last Durham Crisis Intervention Team’s (CIT) Collaborative meeting in August 2018, CIT decided they were so happy with the Data+ Mental Health Interventions project teams’ two years of data analysis that they will be naming the Duke Data + program their “Community Partner Agency of the Year” at this year’s end-of-year celebration banquet on December 7, 2018.

“Evaluation and Research is a Sustaining Core Element of Crisis Intervention Team. We are very fortunate to be in a position to collaborate with Duke’s Data+ Program. The TEAM has done a phenomenal job with evaluation data from 2002 to 2017. We are looking forward to this continued relationship,” says Major Elijah Bazemore of the Durham Sheriff’s Department and Crisis Intervention Team.

Durham Sheriff's Department and their project team, Mental Health Interventions with the Durham Police (Year 2)
Durham Sheriff's Department and their project team, Mental Health Interventions with the Durham Police (Year 2)

Four Data+ project teams are continuing as Bass Connections projects during the upcoming 2018 – 2019 academic year: Gerrymandering and the Extent of Democracy in America, Vaccine Hesitancy & Uptake, Big Data for Reproductive Health, and Data & Technology for Fact-Checking. These Data+ teams laid important groundwork for the incoming team members joining this Fall, and will continue to work on these team projects over the next year.

The Big Data for Reproductive Health (BD4RH) team, led by Amy Finnegan and Megan Huchko, sought to build a web-based application that will allow users to visualize and analyze contraceptive calendar data from the DHS. To ground their project, they did a mapping exercise to identify currently available tools, identifying core elements they liked and key areas a new tool could improve. Using this data, and user feedback from various stakeholders in the field, they created a website that hosts four different data visualization methods to interpret trends in contraceptive use from the DHS contraceptive calendar. The site currently uses Kenya data to demonstrate efficacy, but datasets will be added soon. Although Wei had worked on coding projects in the past, Sao came into the summer with very little coding experience. She enjoyed learning how to work with R, and both students look forward to continuing the project through their Bass Connections team this Fall. The Bass team will continue to improve the website, gain a deeper understanding of machine learning and big data analytics, and engage with key stakeholders to ensure maximal usability for the tool.

The Gerrymandering team has already had some major successes this year in seeing their work applied to the Common Cause vs. Rucho federal court case in North Carolina, where a three-panel judge ruled again on August 27, 2018 that North Carolina had been unconstitutionally gerrymandered when districts in North Carolina were redrawn.

Saumya Sao and Melanie Lai Wai present their Data+ project poster, Big Data for Reproductive Health, which will continue as a Bass Connections project this academic year
Saumya Sao and Melanie Lai Wai present their Data+ project poster, Big Data for Reproductive Health, which will continue as a Bass Connections project this academic year.

iiD is now accepting project proposals for the Summer of 2019! We are especially interested in proposals that involve a partner from outside the academy, or a faculty member from a different discipline. We also encourage proposals that involve previously untested ideas or un-analyzed datasets, and we hope that the Data+ team can make a contribution with important proof-of-principle work that may lead to more substantial faculty work and/or connections in the future. We also welcome proposals that will lead to the undergraduates creating tools that might be used in the classroom or that might facilitate community engagement with data and data-driven questions.

For more information or to discuss a potential proposal, please contact Paul Bendich (bendich@math.duke.edu).