A Bayesian Framework for Joint Analysis of Heterogeneous Neuroscience Data

Project Summary

This paper addresses analysis of heterogeneous data, such as ordered, categorical, real and count data. Such data are of interest in our motivating application, cognitive and brain science, in which subjects may answer questionnaires, and also (separately) undergo fMRI interrogation. A contribution of this paper concerns the joint analysis of how people answer questionnaires and how their brain responds to external stimuli (here visual), the latter measured via fMRI.

Themes and Categories

In this paper we ask a novel and practical question, which to our knowledge has not been considered previously: can one predict the fMRI response (here from the amygdala and the ventral striatum) to external stimuli, based upon knowledge of how the subject answers a questionnaire and genetic data?

A new model is developed for joint analysis of ordered, categorical, real and count data. In the motivating application, the ordered and categorical data are answers to questionnaires, the (word) count data correspond to the text questions from the questionnaires, and the real data correspond to fMRI responses for each subject. We also combine the analysis of these data with single-nucleotide polymorphism (SNP) data from each individual. The questionnaires considered here correspond to standard psychological surveys, and the study is motivated by psychology and neuroscience. The proposed Bayesian model infers sparse graphical models (networks) jointly across people, questions, fMRI stimuli and brain activity, integrated within a new matrix factorization based on latent binary features. We demonstrate how the learned model may take fMRI and SNP data from a subject as inputs, and predict (impute) how the individual would answer a psychological questionnaire; going in the other direction, we also use an individual's SNP data and answers from questionnaires to impute unobserved fMRI data. Each of these two imputation settings has practical and theoretical applications for understanding human behavior and mental health, which are discussed.





Related People

Related Projects

Shannon Houser (Stats/BioChem), Junbo Guan (MIDS), and Gaurav Sirdeshmukh (Stats) spent ten weeks exploring data concerning child and family health in Yolo County, CA. Using R Shiny, the team produced an interactive data dashboard that enables Yolo County residents to find healthcare and childcare providers, food resources, and transportation information.

View the team's project poster here

Watch the team's final presentation on Zoom:


Project Lead: Leigh Ann Simmons (UC Davis)

Sean Fiscus (Math/Econ/EnvEng), Alyssa Shi (Stats), Yamil Lopez-Ruiz (BME/CS), Emmanuel Mokel (Stats/Math) spent ten weeks working with data from CovIdentify, a study that focuses on using wearables to predict and diagnose COVID-19 and the Flu. The team improved the memory efficiency of analytic pipelines, and added capacity to ingest different types of data. This project built upon the work accomplished by the Duke Bass Connections team and the Duke MIDS capstone project.


View the team's project poster here

Watch the team's final presentation on Zoom:


Project Lead: Jessilyn Dunn

A large and growing trove of patient, clinical, and organizational data is collected as a part of the “Help Desk” program at Durham’s Lincoln Community Health Center. Help Desk is a group of student volunteers who connect with patients over the phone and help them navigate to community resources (like food assistance programs, legal aid, or employment centers). Data-driven approaches to identifying service gaps, understanding the patient population, and uncovering unseen trends are important for improving patient health and advocating for the necessity of these resources. Disparities in food security, economic stability, education, neighborhood and physical environment, community and social context, and access to the healthcare system are crucial social determinants of health, which studies indicate account for nearly 70% of all health outcomes.