Diagnosing Diabetes and Predicting Complications

Project Summary

Priya Sarkar (Computer Science), Lily Zerihun (Biology and Global Health), and Anqi Zhang (Biostatistics) spent ten weeks utilizing Duke Electronic Medical Record (EMR) data to identify subgroups of diabetic patients, and predict future complications associated with Type II Diabetes.

Themes and Categories
Year
2016

Project Results

The team utilized t-Distributed Stochastic Neighbor Embedding (t-SNE) for dimensionality reduction of prescribed medications, medical diagnoses, laboratory tests, and patient outcomes. They then performed K-means clustering to identify meaningful clusters of similar patients and explored the sources of similarities. The team also constructed and tested statistical models to predict 13 common complications in diabetic patients, and found high predictive accuracy for several such complications when leveraging the rich data available in EMR.

Project Video:

Download the Executive Summary (PDF)

Faculty Sponsor

Project Manager

"Data+ provided an invaluable opportunity to work with motivated, hard-working students on exciting and challenging data problems. I learned so much about working with others, communicating effectively, and managing students with a variety of backgrounds. Though each of my students had a different level of statistics and coding experience, they made mentoring so easy with their hard work and interest in the project, as well as the effective organization of the summer as a whole. It was a great experience that I highly recommend to other graduate students!" Liz Lorenzi, Ph.D. Candidate, Statistics

Participants

  • Lillian Zerihun, Duke University Biology & Global Health
  • Priya Sarkar, Duke University Computer Science
  • Anqi Zhang, Duke University Biostatistics

Disciplines Involved

  • Biostatistics
  • Public Health
  • All quantitative STEM

 

Related People

Related Projects

The Air Force’s F-15E Strike Eagle jets have parts that wear down and break, causing unscheduled maintenance events that take away valuable time in the air for critical missions and training. Our team, Limitless Data, is working with Seymour Johnson Air Force Base to mine manually entered maintenance data to visualize and predict aircraft failures. We created a prototype data visualization product that will enable maintainers on the flight line and help them identify and repair critical failures before they happen, keeping jets ready to fly, fight and win.

 

Faculty Lead: Dr. Emma Rasiel

Client Lead: Lt. Devon Burger

Project Manger:  Vignesh Kumaresan

This project aims to improve the computational efficiency of signal operations, e.g., sampling and multiplying signals. We design machine learning-based signal processing modules that use an adaptive sampling strategy and interpolation to generate a good approximation of the exact output. While ensuring a low error level, improvements in computational efficiency can be expected for digital signal processing systems using the implemented self-adjusting modules.

Project Leads: Yi Feng, Vahid Tarokh

 

Click here to view the project team's poster

 

Watch the team's final presentation (on Zoom) here:

 

Mapping History has focused on the categorizing, labelling, digitization, and 3D reconstruction of 16th & 17th century maps & atlases of London and Lisbon. Over the course of the summer, the Mapping History team has developed its own unique analytical dataset by painstakingly labelling every element contained within these maps, used python to digitize this dataset, and, now in the projects final stage, has begun the process of reconstructing these historical perspectives in a 3D game engine.

Project Lead: Philip Stern, Ed Triplett

Project Manager: Sam Horewood

 

View the team's final poster here

Watch the team's final presentation (on Zoom) below: