Validating a Topic Model that Predicts Pancreatic Cancer from Latent Structures in the Electronic Medical Record

Project Summary

Furthering the work of a 2016 Data+ team in predictive modeling of pancreatic cancer from electronic medical record (EMR) data, students Siwei Zhang (Masters Biostatistics) and Jake Ukleja (Computer Science) spent ten weeks building a model to predict pancreatic cancer from Electronic Medical Records (EMR) data. They worked with nine years worth of EMR data, including ICD9 diagnostic codes, that contained records from over 200,000 patients.

Themes and Categories
Year
2017
Contact
Paul Benich
bendich@math.duke.edu

Project Results: The team began with exploratory data analysis that illustrated median times of appearance and frequency of specific ICD9 codes, with an eye toward understanding the relation between these statistics and pancreatic cancer diagnosis. They then trained a topic model which predicted past pancreatic cancer diagnosis with high accuracy (93 percent AUC) from ICD9 codes. Finally, they used the topic model outcomes to identify a pool of high-risk patients for potential future study.

Click here for the Executive Summary

Project Leads:

Lisa Satterwhite, PhD

James Abbruzzese, MD

Joseph Lucas, PhD

Project Manager: Tyler Massaro

Related People

Related Projects

Alexa Goble (Finance) joined Econ majors Chavez Cheong and Eli Levine in a ten-week exploration of mortgage enforcement actions related to the financial crisis from earlier in this century. Using NLP techniques on mortgage data from Ohio and Massachusetts, the team validated a new experimental approach to understanding the dynamics between state regulatory agencies, mortgage lenders, brokers, and loan originators. This project was a continuation of two previous Data+ projects:

https://bigdata.duke.edu/projects/american-predatory-lending-global-financial-crisis

https://bigdata.duke.edu/projects/american-predatory-lending-and-global-financial-crisis-year-2

 

View the team's project poster here

Watch the team's final presentation on Zoom:

 

Project Lead: Lee Reiners

Project Manager: Malcolm Smith Fraser

Stats/Sociology major Mitchelle Mojekwu joined Neuroscience majors Kassie Hamilton and Zineb Jaidi in a ten-week exploration of data relevant to an upcoming public school zone redistricting in Durham County. Using information acquired from the General Social Survey and the US Census, the team applied modern mathematical and statistical methods for generating proposed redistricting plans, with the aim of providing decision-makers with information they can use to produce school districts that are equitable and reflective of the Durham County student population.

View the team's project poster here

Watch the team's final presentation on Zoom:

 

Faculty Lead: Greg Herschlag

Project Manager: Bernard Coles

 

Pryia Juarez (BME/ECE), Jonathan Pilland (ECE/BME), and Matthew Traum (CS/Econ) spent teen weeks analyzing sensor data synthesized by an agile waveform generator. The team used deep reinforcement learning techniques to understand the performance of different synthetic agents representing potential attackers to the sensor system.

 

View the team's project poster here

Watch the team's final presentation on Zoom:

 

Faculty leads: Robert Calderbank, Vahid Tarokh, Ali Pezeshki

Client leads: Dr. Lauren Huie, Dr. Elizabeth Bentley, Dr. Zola Donovan, Dr. Ashley Prater-Bennette, Dr. Erin Trip

Project Manger: Suya Wu