Tips in Data Visualization for Genetic Mapping

Project Summary

The aim of this Data Expedition was for students to learn hands-on data visualization techniques using a variety of data types. Students first discussed how data visualization is useful, and tips to make graphs both visually appealing and easy to understand. 

Themes and Categories
C. Ryan Campbell

Graduate Students: Jenn Coughlan, Ryan Campbell

Course: Biology 490s - Methods in Comp Bio & Genomics

Over two 70-minute class periods, the students worked through two tutorials; the first introducing them to the basics of ggplot2, a data visualization package in the free statistical interface R. Students were then given a homework assignment to visualize a simple genotype-phenotype dataset, ‘Coughlan_inversiongenopheno.csv’. In the second class, we began by discussing the homework assignment, thinking of challenges and next steps. Students were then given a much more complicated dataset, involving reduced representation whole genome data from the wildflower Senecio (from Roda et al. 2017, dataset ‘Fst_BSA_wLinkagegrp.csv’). Students used this data to associate survival with allele frequencies across different habitats to determine regions of the genome which are associated with adaptation to edaphic conditions. 

Download the course slides (PDF).


Related Projects

Large publicly available environmental databases are a tremendous resource for both scientists and the general public interested in climate trends and properties. However, without the programming skills to parse and interpret these massive datasets, significant trends may remain hidden from both scientists and the public. In this data exploration, students, over the course of three hours, accessed two large, publicly available datasets, each with greater than 4 million observations. They learned how to use R and RStudio to effectively organize, visualize and statistically explore trends in deep sea physical oceanography.  

Our aim was to introduce students to the wealth of possibilities that human genotyping and sequencing hold by illustrating firsthand the power of these datasets to identify genetic relatives, using the story of the Golden State Killer’s capture with public genetic databases.

This Data Expedition introduced hypothesis-driven data analysis in R and the concept of circular data, while providing some tools for importing it and analyzing it in R.