Validating a Topic Model that Predicts Pancreatic Cancer from Latent Structures in the Electronic Medical Record

Project Summary

Furthering the work of a 2016 Data+ team in predictive modeling of pancreatic cancer from electronic medical record (EMR) data, students Siwei Zhang (Masters Biostatistics) and Jake Ukleja (Computer Science) spent ten weeks building a model to predict pancreatic cancer from Electronic Medical Records (EMR) data. They worked with nine years worth of EMR data, including ICD9 diagnostic codes, that contained records from over 200,000 patients.

Themes and Categories
Year
2017
Contact
Paul Benich
bendich@math.duke.edu

Project Results: The team began with exploratory data analysis that illustrated median times of appearance and frequency of specific ICD9 codes, with an eye toward understanding the relation between these statistics and pancreatic cancer diagnosis. They then trained a topic model which predicted past pancreatic cancer diagnosis with high accuracy (93 percent AUC) from ICD9 codes. Finally, they used the topic model outcomes to identify a pool of high-risk patients for potential future study.

Click here for the Executive Summary

Project Leads:

Lisa Satterwhite, PhD

James Abbruzzese, MD

Joseph Lucas, PhD

Project Manager: Tyler Massaro

Related People

Related Projects

Brooke Erikson (Economics/Computer Science), Alejandro Ortega (Math), and Jade Wu (Computer Science) spent ten weeks developing open-source tools for automatic document categorization, PDF table extraction, and data identification. Their motivating application was provided by Power for All’s Platform for Energy Access Knowledge, and they frequently collaborated with professionals from that organization.

Click here to read the Executive Summary

 

Jake Epstein (Statistics/Economics), Emre Kiziltug (Economics), and Alexander Rubin (Math/Computer Science) spent ten weeks investigating the existence of relative value opportunities in global corporate bond markets. They worked closely with a dataset provided by a leading asset management firm.

Click here for the Executive Summary

Maksym Kosachevskyy (Economics) and Jaehyun Yoo (Statistics/Economics) spent ten weeks understanding temporal patterns in the used construction machinery market and investigating the relationship between these patterns and macroeconomic trends.

They worked closely with a large dataset provided by MachineryTrader.com, and discussed their findings with analytics professionals from a leading asset management firm.

Click here to read the Executive Summary