Diagnosing Diabetes and Predicting Complications

Project Summary

Priya Sarkar (Computer Science), Lily Zerihun (Biology and Global Health), and Anqi Zhang (Biostatistics) spent ten weeks utilizing Duke Electronic Medical Record (EMR) data to identify subgroups of diabetic patients, and predict future complications associated with Type II Diabetes.

Themes and Categories

Project Results

The team utilized t-Distributed Stochastic Neighbor Embedding (t-SNE) for dimensionality reduction of prescribed medications, medical diagnoses, laboratory tests, and patient outcomes. They then performed K-means clustering to identify meaningful clusters of similar patients and explored the sources of similarities. The team also constructed and tested statistical models to predict 13 common complications in diabetic patients, and found high predictive accuracy for several such complications when leveraging the rich data available in EMR.

Project Video:

Download the Executive Summary (PDF)

Faculty Sponsor

Project Manager

"Data+ provided an invaluable opportunity to work with motivated, hard-working students on exciting and challenging data problems. I learned so much about working with others, communicating effectively, and managing students with a variety of backgrounds. Though each of my students had a different level of statistics and coding experience, they made mentoring so easy with their hard work and interest in the project, as well as the effective organization of the summer as a whole. It was a great experience that I highly recommend to other graduate students!" Liz Lorenzi, Ph.D. Candidate, Statistics


  • Lillian Zerihun, Duke University Biology & Global Health
  • Priya Sarkar, Duke University Computer Science
  • Anqi Zhang, Duke University Biostatistics

Disciplines Involved

  • Biostatistics
  • Public Health
  • All quantitative STEM


Related People

Related Projects

A team of students that worked together for a semester in the Mission Driven Startups class will obtain and analyze data to create a predictive maintenance model for F15-E Fighter Jets from Seymour Johnson Air Base. Using data provided by the Base, the Data+ team will evaluate the relationship between unscheduled maintenance and external factors such as weather, sortie hours between repairs, and failure frequency of aircraft components. These findings will then feed into a predictive maintenance model to enhance the Air Force Crew’s ability to anticipate maintenance needs, helping to minimize unscheduled aircraft downtime. 


Faculty Lead: Dr. Emma Rasiel

Client Lead: Lt. Devon Burger

Project Manger:  Vignesh Kumaresan

A team of students, led by Electrical and Computer Engineering professor Vahid Tarokh, will develop methods to improve the efficiency of information processing with adaptive decisions according to the structure of new incoming data. Students will have the opportunity to explore data-driven adaptive strategies based on neural networks and statistical learning models, investigate trade-offs between error threshold and computational complexity for various fundamental operations, and implement software prototypes. The outcome of this project can potentially speed up many systems and networks involving data sensing, acquisition, and computation.

Project Leads: Yi Feng, Vahid Tarokh

A team of students will explore new ways of reading pre-modern maps and perspectival views through image tagging, annotation and 3D modeling. Each student will build a typology of icons found in these early maps (for example, houses, churches, roads, rivers, etc.). By extracting, modeling, and cataloging these features, the team will create a library of 2D and 3D objects that will be used to (a) identify patterns in how space and power are represented across these maps, and (b) to create a model for “experiencing” these maps in 3D, using the Unity game engine platform. This is a combined Data+ / Bass Connections project that will instruct students in qualitative and quantitative mapping techniques, basic 3D modeling and the history of cartography.

Project Lead: Philip Stern, Ed Triplett

Project Manager: Sam Horewood