Predicting Pancreatic Cancer

Project Summary

Albert Antar(Biology), and Zidi Xiu (Biostatistics) spent ten weeks leveraging Duke Electronic Medical Record (EMR) data to build predictive models of Pancreatic ductal adenocarcinoma (PDAC). PDAC is the 4th leading cause of cancer deaths in the US, and is most often is diagnosed in stage IV, with a survival rate of only 1% and life expectancy measured in months. Diagnosis of PDAC is very challenging due of deep anatomical placement, and significant risk imposed by traditional biopsy. The goal of this project is to utilize EMR data to identify potential avenues for diagnosing PDAC in the early treatable stages of disease.

Themes and Categories
Year
2016

Project Results

The team first constructed a patient timeline leading up to PDAC using diagnostic codes in the EMR data. They then applied a supervised topic model to the diagnosis code data, resulting in highly interpretable groups of diagnoses, and a promising predictive model for pancreatic cancer. The study team is following up with clinical colleagues in the Duke Department of Medicine to initiate further studies based on the team's work.

Download the Executive Summary (PDF)

Faculty Sponsors

Project Manager

  • Shaobo Han, post-doc, Electrical and Computer Engineering

Participants

  • Albert Antar, Duke University Biology
  • Zidi Xiu, Student Mentor, Master's student Biostatistics, Duke University

Disciplines Involved

  • Biostatistics
  • Pre-med
  • All quantitative STEM

Related People

Related Projects

Nationally there is a disproportionate number of children of color (African American & Latino) in the child welfare system. Durham County is no different. However, reviewing this problem through the lens of data has not been done to formulate or implement possible solutions. Durham County Department of Social Services Child & Family Services would like to evaluate systems to identify where and how disproportionality and disparity are occurring. It is occurring at the entry point of Reporting child abuse and neglect? Is it occurring at the case decision? Is our reunification time different for African American children? Or Does it take longer for a child of color to achieve permanence through adoption? Organizing the data to show us our “hot spots” would facilitate further discussion and focus on solutions to an age-old systemic problem.

Faculty Lead: Greg Herschlag

Project Lead: Jovetta L Whitfield

Student teams will develop a benchmark dataset and explore its efficacy in an in house competition where they will put new innovative techniques such as machine learning to the test through a series of challengesA team of students will develop benchmark data pertaining to network performance in the presence of intentional and non-intentional degradation, ranging from sensor failure and additive noise to adversarial interference.  The students will analyze the baseline performance of the network, and measure performance of the degraded network with and without the inclusion of robust techniques that shore up robustness.  Students will have the opportunity to present findings to scientists & engineers from the Air Force Research Laboratory.

Faculty leads: Robert Calderbank, Vahid Tarokh, Ali Pezeshki

Client leads: Dr. Lauren Huie, Dr. Elizabeth Bentley, Dr. Zola Donovan, Dr. Ashley Prater-Bennette, Dr. Erin Trip

A team of students will use visual reporting tools, such as Tableau or Power Bi, to create a dynamic dashboard that will enable investment professionals at Duke University Management Company (DUMAC) to review and better understand fund managers’ exposures and positioning across various dimensions.  Students will collaborate with teams at DUMAC to develop an intuitive, visual dashboard to help the investment team review individual and portfolio-level exposures across numerous asset classes.  This dashboard will be connected to DUMAC’s data warehouse, which refreshes on a daily basis as new data comes in for DUMAC’s public market accounts.