Research

Research projects at iiD focus on building connections. We encourage crosspollination of ideas across disciplines, and to develop new forms of collaboration that will advance research and education across the full spectrum of disciplines at Duke. The topics below show areas of research focus at iiD. See all of our research.

Brooke Erikson (Economics/Computer Science), Alejandro Ortega (Math), and Jade Wu (Computer Science) spent ten weeks developing open-source tools for automatic document categorization, PDF table extraction, and data identification. Their motivating application was provided by Power for All’s Platform for Energy Access Knowledge, and they frequently collaborated with professionals from that organization.

Click here to read the Executive Summary

 

Jake Epstein (Statistics/Economics), Emre Kiziltug (Economics), and Alexander Rubin (Math/Computer Science) spent ten weeks investigating the existence of relative value opportunities in global corporate bond markets. They worked closely with a dataset provided by a leading asset management firm.

Click here for the Executive Summary

Maksym Kosachevskyy (Economics) and Jaehyun Yoo (Statistics/Economics) spent ten weeks understanding temporal patterns in the used construction machinery market and investigating the relationship between these patterns and macroeconomic trends.

They worked closely with a large dataset provided by MachineryTrader.com, and discussed their findings with analytics professionals from a leading asset management firm.

Click here to read the Executive Summary

Alec Ashforth (Economics/Math), Brooke Keene (Electrical & Computer Engineering), Vincent Liu (Electrical & Computer Engineering), and Dezmanique Martin (Computer Science) spent ten weeks helping Duke’s Office of Information Technology explore the development of an “e-advisor” app that recommends co-curricular opportunities to students based on a variety of factors. The team used collaborative and content-based filtering to create a recommender-system prototype in R Shiny.

Click here to read the Executive Summary

Statistical Science majors Eidan Jacob and Justina Zou joined forces with math major Mason Simon built interactive tools that analyze and visualize the trajectories taken by wireless devices as they move across Duke’s campus and connect to its wireless network. They used de-identified data provided by Duke’s Office of Information Technology, and worked closely with professionals from that office.

Click here for the Executive Summary

Cecily Chase (Applied Math), Brian Nieves (Computer Science), and Harry Xie (Computer Science/Statistics) spent ten weeks understanding how algorithmic approaches can shed light on which data center tasks (“stragglers”) are typically slowed down by unbalanced or limited resources. Working with a real dataset provided by project clients Lenovo, the team created a monitoring framework that flags stragglers in real time.

Click here to read the Executive Summary

David Liu (Electrical Computer Engineering) and Connie Wu (Computer Science/Statistics) spent ten weeks analyzing data about walking speed from the 6th Vital Sign Study.

Integrating study data with public data from the American Community Survey, they built interactive visualization tools that will help researchers understand the study results and the representativeness of study participants.

Click here to read the Executive Summary

Lucas Fagan (Computer Science/Public Policy), Caroline Wang (Computer Science/Math), and Ethan Holland (Statistics/Computer Science) spent ten weeks understanding how data science can contribute to fact-checking methodology. Training on audio data from major news stations, they adapted OpenAI methods to develop a pipeline that moves from audio data to an interface that enables users to search for claims related to other claims that had been previously investigated by fact-checking websites.

This project will continue into the academic year via Bass Connections.

Click here to read the Executive Summary.

A team of students led by Professors Jonathan Mattingly and Gregory Herschlag will investigate gerrymandering in political districting plans.  Students will improve on and employ an algorithm to sample the space of compliant redistricting plans for both state and federal districts.  The output of the algorithm will be used to detect gerrymandering for a given district plan; this data will be used to analyze and study the efficacy of the idea of partisan symmetry.  This work will continue the Quantifying Gerrymandering project, seeking to understand the space of redistricting plans and to find justiciable methods to detect gerrymandering. The ideal team has a mixture of members with programing backgrounds (C, Java, Python), statistical experience including possibly R, mathematical and algorithmic experience, and exposure to political science or other social science fields.

Read the latest updates about this ongoing project by visiting Dr. Mattingly's Gerrymandering blog.

Varun Nair (Mechanical Engineering), Tamasha Pathirathna (Computer Science), Xiaolan You (Computer Science/Statistics), and Qiwei Han (Chemistry) spent ten weeks creating a ground-truthed dataset of electricity infrastructure that can be used to automatically map the transmission and distribution components of the electric power grid. This is the first publicly available dataset of its kind, and will be analyzed during the academic year as part of a Bass Connections team.

Click here to read the Executive Summary

Kimberly Calero (Public Policy/Biology/Chemistry), Alexandra Diaz (Biology/Linguistics), and Cary Shindell (Environmental Engineering) spent ten weeks analyzing and visualizing data about disparities in Social Determinants of Health. Working with data provided by the MURDOCK Study, the American Community Survey, and the Google Places API, the team built a dataset and visualization tool that will assist the MURDOCK research team in exploring health outcomes in Cabarrus County, NC.

Click here to read the Executive Summary

Alexandra Putka (Biology/Neuroscience), John Madden (Economics), and Lucy St. Charles (Global Health/Spanish) spent ten weeks understanding the coverage and timeliness of maternal and pediatric vaccines in Durham. They used data from DEDUCE, the American Community Survey, and the CDC.

This project will continue into the academic year via Bass Connections.

Click here to read the Executive Summary

Dima Fayyad (Electrical & Computer Engineering), Sean Holt (Math), David Rein (Computer Science/Math) spent ten weeks exploring tools that will operationalize the application of distributed computing methodologies in the analysis of electronic medical records (EMR) at Duke.

As a case study, they applied these systems to an Natural Language Processing project on clinical narratives about growth failure in premature babies.

Click here to read the Executive Summary

Zhong Huang (Sociology) and Nishant Iyengar (Biomedical Engineering) spent ten weeks investigating the clinical profiles of rare metabolic diseases. Working with a large dataset provided by the Duke University Health System, the team used natural language processing techniques and produced an R Shiny visualization that enables clinicians to interactively explore diagnosis clusters.

Click here to read the Executive Summary

Samantha Garland (Computer Science), Grant Kim (Computer Science, Electrical & Computer Engineering), and Preethi Seshadri (Data Science) spent ten weeks exploring factors that influence patient choices when faced with intermediate-stage prostate cancer diagnoses. They used topic modeling in an analysis of a large collection of clinical appointment transcripts.

Click here for the Executive Summary

Nathan Liang (Psychology, Statistics), Sandra Luksic (Philosophy, Political Science),and Alexis Malone (Statistics) began their 10-week project as an open-ended exploration how women are depicted both physically and figuratively in women's magazines, seeking to consider what role magazines play in the imagined and real lives of women.

Click here to read the Executive Summary

Jennie Wang (Economics/Computer Science) and Blen Biru (Biology/French) spent ten weeks building visualizations of various aspects of the lives of orphaned and separated children at six separate sites in Africa and Asia. The team created R Shiny interactive visualizations of data provided by the Positive Outcomes for Orphans study (POFO).

Click here to read the Executive Summary

Aaron Crouse (Divinity), Mariah Jones (Sociology), Peyton Schafer (Statistics), and Nicholas Simmons (English/Education) spent ten weeks consulting with leadership from the Parents Teacher Association at Glenn Elementary School in Durham. The team set up infrastructure for data collection and visualization that will aid the PTA in forming future strategy.

Click here to read the Executive Summary

In tracing the publication history, geographical spread, and content of “pirated” copies of Daniel Defoe’s Robinson Crusoe, Gabriel Guedes (Math, Global Cultural Studies), Lucian Li (Computer Science, History), and Orgil Batzaya (Math, Computer Science) explored the complications of looking at a data set that saw drastic changes over the last three centuries in terms of spelling and grammar, which offered new challenges to data cleanup. By asking questions of the effectiveness of “distant reading” techniques for comparing thousands of different editions of Robinson Crusoe, the students learned how to think about the appropriateness of myriad computational methods like doc2vec and topic modeling. Through these methods, the students started to ask, at what point does one start seeing patterns that were invisible at a human scale of reading (reading one book at a time)? While the project did not definitively answer these questions, it did provide paths for further inquiry.

The team published their results at: https://orgilbatzaya.github.io/pirating-texts-site/

Click here for the Executive Summary

Melanie Lai Wai (Statistics) and Saumya Sao (Global Health, Gender Studies) spent ten weeks developing a platform which enables users to understand factors that influence contraceptive use and discontinuation. Their work combined data from the Demographic and Health Surveys contraceptive calendar with open data about reproductive health and social indicators from the World Bank, World Health Organization, and World Population Prospects. This project will continue into the academic year via Bass Connections.

Click here to read the Executive Summary

Bob Ziyang Ding (Math/Stats) and Daniel Chaofan Tao (ECE) spent ten weeks understanding how deep learning techniques can shed light on single cell analysis. Working with a large set of single-cell sequencing data, the team built an autoencoder pipeline and a device that will allow biologists to interactively visualize their own data.

Click here to read the Executive Summary

Ashley Murray (Chemistry/Math), Brian Glucksman (Global Cultural Studies), and Michelle Gao (Statistics/Economics) spent 10 weeks analyzing how meaning and use of the work “poverty” changed in presidential documents from the 1930s to the present. The students found that American presidential rhetoric about poverty has shifted in measurable ways over time. Presidential rhetoric, however, doesn’t necessarily affect policy change. As Michelle Gao explained, “The statistical methods we used provided another more quantitative way of analyzing the text. The database had around 130,000 documents, which is pretty impossible to read one by one and get all the poverty related documents by brute force. As a result, web-scraping and word filtering provided a more efficient and systematic way of extracting all the valuable information while minimizing human errors.” Through techniques such as linear regression, machine learning, and image analysis, the team effectively analyzed large swaths of textual and visual data. This approach allowed them to zero in on significant documents for closer and more in-depth analysis, paying particular attention to documents by presidents such as Franklin Delano Roosevelt or Lyndon B. Johnson, both leaders in what LBJ famously called “The War on Poverty.”

Click Here for the Executive Summary

Natalie Bui (Math/Economics), David Cheng (Electrical & Computer Engineering), and Cathy Lee (Statistics) spent ten weeks helping the Prospect Management and Analytics office of Duke Development understand how a variety of analytic techniques might enhance their workflow. The team used topic modeling and named entity recognition to develop a pipeline that clusters potential prospects into useful categories.

Click here to read the Executive Summary

Tatanya Bidopia (Psychology, Global Health), Matthew Rose (Computer Science), Joyce Yoo (Public Policy/Psychology) spent ten weeks doing a data-driven investigation of the relationship between mental health training of law enforcement officers and key outcomes such as incarceration, recidivism, and referrals for treatment. They worked closely with the Crisis Intervention Team, and they used jail data provided by the Sheriff’s Office of Durham County.

Click here to read the Executive Summary