Tips in Data Visualization for Genetic Mapping

Project Summary

The aim of this Data Expedition was for students to learn hands-on data visualization techniques using a variety of data types. Students first discussed how data visualization is useful, and tips to make graphs both visually appealing and easy to understand. 

Themes and Categories
Contact
C. Ryan Campbell
Biology
c.ryan.campbell@duke.edu

Graduate Students: Jenn Coughlan, Ryan Campbell

Course: Biology 490s - Methods in Comp Bio & Genomics

Over two 70-minute class periods, the students worked through two tutorials; the first introducing them to the basics of ggplot2, a data visualization package in the free statistical interface R. Students were then given a homework assignment to visualize a simple genotype-phenotype dataset, ‘Coughlan_inversiongenopheno.csv’. In the second class, we began by discussing the homework assignment, thinking of challenges and next steps. Students were then given a much more complicated dataset, involving reduced representation whole genome data from the wildflower Senecio (from Roda et al. 2017, dataset ‘Fst_BSA_wLinkagegrp.csv’). Students used this data to associate survival with allele frequencies across different habitats to determine regions of the genome which are associated with adaptation to edaphic conditions. 

Download the course slides (PDF).

 

Related Projects

The aim of this data expedition was to give students an introduction to stable isotopes and how the data can be used to understand trophic dynamics. 

A team of students led by Janet Bettger and an interdisciplinary team with the 6th Vital Sign Study will use Census and other public data to examine the representativeness of people who participated in this smartphone based population health study. Students will design an online interactive map and other web-based tools that can be easily updated with new study participants illustrating key relationships such as health status with rurality, medical service availability, and sociodemographics. The online tools will be used to direct education efforts on the importance of walking speed as a marker of health and as the sixth vital sign. Findings from the data analysis will be used by GANDHI to direct scale-up of smartphone based research in target geographic areas and with specific population subgroups such as older adults and those with chronic illness.

A team of students led by faculty and researchers at the Social Science Research Institute will bring together data that will facilitate research using social determinants of health (SDH) to examine, understand, and ameliorate health disparities. This project will identify SDH variables that have the potential to be linked to data from the MURDOCK Study, a longitudinal health study based in Cabbarus County, NC. Much of this data – information relevant to understanding socioeconomic status, education, the physical and social environment, employment, and social support networks – is publicly available or easily obtained and its aggregation and analysis offer opportunities to significantly improve predictions of health risks and improve personalized care. Students will evaluate potential data sources, develop ethical policies to protect respondent privacy, clean and merge data, create documentation for data sharing and reuse, and use statistical tools and neighborhood mapping software to examine patterns of disparity.