Improving the Machine Learning Pipeline at Duke

Project Summary

A team of students will contribute to an effort to operationalize the application of distributed computing methodologies in the analysis of electronic medical records (EMR) at Duke.  Specifically, the team will compare and contrast conventional (Oracle Exadata) and distributed (Apache SPARK) systems in the analysis of EMR data, and create recommendations for implementation.  Students will then use these systems to execute natural language processing (NLP) on clinical narratives and radiology notes with existing, ongoing analyses of Duke data.  This Data+ team will work with the Duke Forge, an interdepartmental collaboration focused on data science research and innovation in health and biomedical sciences.

Themes and Categories
Paul Bendich

Disciplines Involved: PreHealth/PreMed, BME, Economics, Biostatistics, all quantitative STEM

Project Leads: Ricardo Henao, Robert Overton

Project Mangers: A.J. Overton, Ben Neely

