Improving the Machine Learning Pipeline at Duke

Project Summary

A team of students will contribute to an effort to operationalize the application of distributed computing methodologies in the analysis of electronic medical records (EMR) at Duke.  Specifically, the team will compare and contrast conventional (Oracle Exadata) and distributed (Apache SPARK) systems in the analysis of EMR data, and create recommendations for implementation.  Students will then use these systems to execute natural language processing (NLP) on clinical narratives and radiology notes with existing, ongoing analyses of Duke data.  This Data+ team will work with the Duke Forge, an interdepartmental collaboration focused on data science research and innovation in health and biomedical sciences.

Themes and Categories
Paul Bendich

Disciplines Involved: PreHealth/PreMed, BME, Economics, Biostatistics, all quantitative STEM

Project Leads: Ricardo Henao, Robert Overton

Project Mangers: A.J. Overton, Ben Neely

Related People

Related Projects

We are seeking an exceptional researcher to work with Vahid Tarokh at the Information Initiative at Duke on foundations of Non-Commutative Information Theory, and the Design of Algorithms for the Processing of Multimodal Data based on these theoretical findings.

We are seeking up to two exceptional researchers to work on calculation of Fundamental Limits of Learning for High Dimensional, Purely High dimensional data, Sparse Data, and the Design of Limit Achieving Algorithms to work with Vahid Tarokh at the Information Initiative at Duke.

We are seeking an exceptional researcher to work on Change Detection for Multimodal Data, and Algorithm Design with Vahid Tarokh at the Information Initiative at Duke.