A Bayesian Framework for Joint Analysis of Heterogeneous Neuroscience Data

Project Summary

This paper addresses analysis of heterogeneous data, such as ordered, categorical, real and count data. Such data are of interest in our motivating application, cognitive and brain science, in which subjects may answer questionnaires, and also (separately) undergo fMRI interrogation. A contribution of this paper concerns the joint analysis of how people answer questionnaires and how their brain responds to external stimuli (here visual), the latter measured via fMRI.

Themes and Categories

In this paper we ask a novel and practical question, which to our knowledge has not been considered previously: can one predict the fMRI response (here from the amygdala and the ventral striatum) to external stimuli, based upon knowledge of how the subject answers a questionnaire and genetic data?

A new model is developed for joint analysis of ordered, categorical, real and count data. In the motivating application, the ordered and categorical data are answers to questionnaires, the (word) count data correspond to the text questions from the questionnaires, and the real data correspond to fMRI responses for each subject. We also combine the analysis of these data with single-nucleotide polymorphism (SNP) data from each individual. The questionnaires considered here correspond to standard psychological surveys, and the study is motivated by psychology and neuroscience. The proposed Bayesian model infers sparse graphical models (networks) jointly across people, questions, fMRI stimuli and brain activity, integrated within a new matrix factorization based on latent binary features. We demonstrate how the learned model may take fMRI and SNP data from a subject as inputs, and predict (impute) how the individual would answer a psychological questionnaire; going in the other direction, we also use an individual's SNP data and answers from questionnaires to impute unobserved fMRI data. Each of these two imputation settings has practical and theoretical applications for understanding human behavior and mental health, which are discussed.





Related People

Related Projects

A team of students will contribute to an effort to operationalize the application of distributed computing methodologies in the analysis of electronic medical records (EMR) at Duke.  Specifically, the team will compare and contrast conventional (Oracle Exadata) and distributed (Apache SPARK) systems in the analysis of EMR data, and create recommendations for implementation.  Students will then use these systems to execute natural language processing (NLP) on clinical narratives and radiology notes with existing, ongoing analyses of Duke data.  This Data+ team will work with the Duke Forge, an interdepartmental collaboration focused on data science research and innovation in health and biomedical sciences.

With the significant international consequences of recent outbreaks, the ITP Lab conducted extensive stakeholder interviews and macro-level health policy analysis to expose gaps in pandemic preparedness and develop legal frameworks for future threats. 

Paclitaxel (Taxol) is a small molecule drug belonging to the taxane family. It is one of the most commonly used chemotherapeutics, used for treatment of many cancers, as a monotherapy or in combination with other drugs to treat breast, lung and ovarian cancer as well as Kaposi’s sarcoma. Taxol is on the World Health Organization’s (WHO) List of Essential Medicines, a list that includes most the important medications for basic health. The worldwide demand for paclitaxel is exceeding the current supply.