Tips in Data Visualization for Genetic Mapping

Project Summary

The aim of this Data Expedition was for students to learn hands-on data visualization techniques using a variety of data types. Students first discussed how data visualization is useful, and tips to make graphs both visually appealing and easy to understand. 

Themes and Categories
Year
2017
Contact
C. Ryan Campbell
Biology
c.ryan.campbell@duke.edu

Graduate Students: Jenn Coughlan, Ryan Campbell

Course: Biology 490s - Methods in Comp Bio & Genomics

Over two 70-minute class periods, the students worked through two tutorials; the first introducing them to the basics of ggplot2, a data visualization package in the free statistical interface R. Students were then given a homework assignment to visualize a simple genotype-phenotype dataset, ‘Coughlan_inversiongenopheno.csv’. In the second class, we began by discussing the homework assignment, thinking of challenges and next steps. Students were then given a much more complicated dataset, involving reduced representation whole genome data from the wildflower Senecio (from Roda et al. 2017, dataset ‘Fst_BSA_wLinkagegrp.csv’). Students used this data to associate survival with allele frequencies across different habitats to determine regions of the genome which are associated with adaptation to edaphic conditions. 

Download the course slides (PDF).

 

Related Projects

This Data Expedition introduces students to network tools and approaches and invites students to consider the relationship(s) between social networks and social imaginaries. Using foundation-funding data that was collected from the The Foundation Directory Online, the Data Expedition enables students to visualize and explore the relationship between networks, social imaginaries, and funding for higher education. The Data Expedition is based on two sets of data. The first set list the grants received by Duke University in 2016 from five foundations: The Bill and Melinda Gates Foundation, Fidelity Charitable Gift Fund, Silicon Valley Community Foundation, The Community Foundation of Western North Carolina, and The Robert Wood Johnson Foundation. The second set lists the names of board members from Duke University and each of these five foundations along with the degree granting institution for their undergraduate education. For the sake of this exercise, the degree granting institutions data was fabricated from a randomized list of the top twenty-five undergraduate institutions.

This Data Expedition seeks to introduce students to statistical analysis in the field of international development. Students construct a index of wealth/poverty based on asset holdings using four datasets collected under the umbrella of the Living Standards Measurement Survey project at the World Bank. We selected countries to represent different continents with comparable and recent survey data: Bulgaria (2007), Tajikistan (2009), Tanzania (2010-2011), and Panama (2008).

First, we construct an index of wealth based on household assets in the different countries using Principle Components Analysis. Once a poverty index is constructed, students seek to understand what the main drivers of wealth/poverty are in different countries. We include variables for health, education, age, relationship to the household head, and sex. Students then use regression analysis to identify the main drivers of poverty in different countries.

This data expedition explores the local (ego) patent citation networks of three hybrid vehicle-related patents. The concept of patent citations and technological development is a core theme in innovation and entrepreneurship, and the purpose of these network explorations is to both quantitatively and visually assess how innovations are connected and what these connections mean for the focal innovations and the technologies that draw on those patents in the future. The expedition was incorporated as part of the Sociology of Entrepreneurship class, where students are thinking about the emergence and diffusion of innovations.