Tips in Data Visualization for Genetic Mapping

Project Summary

The aim of this Data Expedition was for students to learn hands-on data visualization techniques using a variety of data types. Students first discussed how data visualization is useful, and tips to make graphs both visually appealing and easy to understand. 

Themes and Categories
Contact
C. Ryan Campbell
Biology
c.ryan.campbell@duke.edu

Graduate Students: Jenn Coughlan, Ryan Campbell

Course: Biology 490s - Methods in Comp Bio & Genomics

Over two 70-minute class periods, the students worked through two tutorials; the first introducing them to the basics of ggplot2, a data visualization package in the free statistical interface R. Students were then given a homework assignment to visualize a simple genotype-phenotype dataset, ‘Coughlan_inversiongenopheno.csv’. In the second class, we began by discussing the homework assignment, thinking of challenges and next steps. Students were then given a much more complicated dataset, involving reduced representation whole genome data from the wildflower Senecio (from Roda et al. 2017, dataset ‘Fst_BSA_wLinkagegrp.csv’). Students used this data to associate survival with allele frequencies across different habitats to determine regions of the genome which are associated with adaptation to edaphic conditions. 

Download the course slides (PDF).

 

Related Projects

Marine mammals exhibit extreme physiological and behavioral adaptions that allow them to dive hundreds to thousands of meters underwater despite their need to breathe air at the surface. Through the development of new remote monitoring technologies, we are just beginning to understand the mechanisms by which they are able to execute these extreme behaviors. Long- term animal-borne tags can now record location, dive depth, and dive duration and then transmit these data to satellite receivers, enabling remote access to behavior occurring both many kilometers out to sea and several kilometers below the ocean surface. 

The aim of our data expeditions course was to give students in Bio 190S-0.2, a summer session course in sensory systems, an introduction to how real data may actually look and how they may actually be analyzed. Over the course of a two-hour class session, 16 students ranging from 16-22 years old were given the opportunity to explore a dataset on the color vision capabilities of three species of cleaner shrimp.

Matt and Ken led two labs for the engineering section of STA 111/130, an introductory course in statistics and probability. The lab assignments were written by Matt and Ken in order to bridge the gap between introductory linear regression, which is often explained in terms of a static, complete dataset, and time series analysis, which is not a common topic in introductory courses.