Exploring the genetic basis of yeast biofilms

Project Summary

Students learned to visualize high-dimensional gene expression data; understand genetic differences in the context of gene networks; connect genetic differences to physiological outcomes; and perform simple analyses using the R programming language.

Themes and Categories
Year
Contact
Paul Bendich
bendich@math.duke.edu

Graduate students: Liana Burghardt and Colin Maxwell, PhD candidates, Biology Department

Faculty instructor: Danielle Armaleo

Course: Collaboration with Dr. Armaleo in Bio 214 (Cellular and Molecular Biology)

Read the report to learn more (PDF).

Related Projects

Large publicly available environmental databases are a tremendous resource for both scientists and the general public interested in climate trends and properties. However, without the programming skills to parse and interpret these massive datasets, significant trends may remain hidden from both scientists and the public. In this data exploration, students, over the course of three hours, accessed two large, publicly available datasets, each with greater than 4 million observations. They learned how to use R and RStudio to effectively organize, visualize and statistically explore trends in deep sea physical oceanography.  

Our aim was to introduce students to the wealth of possibilities that human genotyping and sequencing hold by illustrating firsthand the power of these datasets to identify genetic relatives, using the story of the Golden State Killer’s capture with public genetic databases.

This Data Expedition introduced hypothesis-driven data analysis in R and the concept of circular data, while providing some tools for importing it and analyzing it in R.