Predicting Pancreatic Cancer

Project Summary

Albert Antar(Biology), and Zidi Xiu (Biostatistics) spent ten weeks leveraging Duke Electronic Medical Record (EMR) data to build predictive models of Pancreatic ductal adenocarcinoma (PDAC). PDAC is the 4th leading cause of cancer deaths in the US, and is most often is diagnosed in stage IV, with a survival rate of only 1% and life expectancy measured in months. Diagnosis of PDAC is very challenging due of deep anatomical placement, and significant risk imposed by traditional biopsy. The goal of this project is to utilize EMR data to identify potential avenues for diagnosing PDAC in the early treatable stages of disease.

Themes and Categories
Year
2016

Project Results

The team first constructed a patient timeline leading up to PDAC using diagnostic codes in the EMR data. They then applied a supervised topic model to the diagnosis code data, resulting in highly interpretable groups of diagnoses, and a promising predictive model for pancreatic cancer. The study team is following up with clinical colleagues in the Duke Department of Medicine to initiate further studies based on the team's work.

Download the Executive Summary (PDF)

Faculty Sponsors

Project Manager

  • Shaobo Han, post-doc, Electrical and Computer Engineering

Participants

  • Albert Antar, Duke University Biology
  • Zidi Xiu, Student Mentor, Master's student Biostatistics, Duke University

Disciplines Involved

  • Biostatistics
  • Pre-med
  • All quantitative STEM

Related People

Related Projects

Social and environmental contexts are increasingly recognized as factors that impact health outcomes of patients. This team will have the opportunity to collaborate directly with clinicians and medical data in a real-world setting. They will examine the association between social determinants with risk prediction for hospital admissions, and to assess whether social determinants bias that risk in a systematic way. Applied methods will include machine learning, risk prediction, and assessment of bias. This Data+ project is sponsored by the Forge, Duke's center for actionable data science.

Project Leads: Shelly Rusincovitch, Ricardo Henao, Azalea Kim

Project Manager: Austin Talbot

Aaron Chai (Computer Sciece, Math) and Victoria Worsham (Economics, Math) spent ten weeks building tools to understand characteristics of successful oil and gas licenses in the North Sea. The team used data-scraping, merging, and OCR method to create a dataset containing license information and work obligations, and they also produced ArcGIS visualizations of license and well locations. They had the chance to consult frequently with analytics professionals at ExxonMobil.

Click here to read the Executive Summary

 

Project Lead: Kyle Bradbury

Project Manager: Artem Streltsov

Yueru Li (Math) and Jiacheng Fan (Economics, Finance) spent ten weeks investigating abnormal behavior by companies bidding for oil and gas rights in the Gulf of Mexico. Working with data provided by the Bureau of Ocean Energy Management and ExxonMobil, the team used outlier detection methods to automate the flagging of abnormal behavior, and then used statistical methods to examine various factors that might predict such behavior. They had the chance to consult frequently with analytics professionals at ExxonMobil.

 

Click here to read the Executive Summary

 

Project Lead: Kyle Bradbury

Project Manager: Hyeongyul Roh