Data Expeditions

A Data Expedition is an element of an undergraduate course that introduces students to exploratory data analysis.

Pairs of graduate students, often from different disciplines, work with the course instructor to formulate a question that will engage the students, and a pathway through a dataset that will provide insight.

Graduate student participants will receive a travel grant. Browse our current projects to find opportunities.


The aim of our data expeditions course was to give students in Bio 190S-0.2, a summer session course in sensory systems, an introduction to how real data may actually look and how they may actually be analyzed. Over the course of a two-hour class session, 16 students ranging from 16-22 years old were given the opportunity to explore a dataset on the color vision capabilities of three species of cleaner shrimp.

Matt and Ken led two labs for the engineering section of STA 111/130, an introductory course in statistics and probability. The lab assignments were written by Matt and Ken in order to bridge the gap between introductory linear regression, which is often explained in terms of a static, complete dataset, and time series analysis, which is not a common topic in introductory courses. 

Graduate Students: Kendra Kaiser and John Mallard

Faculty: Michael O’Driscoll

Course: Landscape Hydrology, EOS 323/723

Graduate Student: Jacob Coleman, 3rd year Ph.D. student in Statistical Science

Faculty Instructor: Colin Rundel

Class: STA 112, Data Science

Graduate student: Hamza Ghadyali          

Faculty instructor: Dr. Paul Bendich

Course: MATH 412 – Topology with Applications

In this Data Expedition, Duke undergraduates were introduced to a real world traffic citation data set. Provided by Dr. Frank R. Baumgartner, a political scientist at UNC, the data consist of 15 years of traffic stops, with over 18 million observations of 53 variables.

Dr. Guillermo Sapiro, professor in Pratt School of Engineering at Duke University, conducts ongoing autism research. Using image processing, he attempts to program a computer to detect whether babies (around eight to 14 months of age) display a sign of autism. This very early detection enables doctors to train these babies (when their brain plasticity is high) to behave in ways to counter the behavioral limitations autism imposes, thus allowing these babies to act more normally as they grow up. 

Graduate students: Aaron Berdanier and Matt Kwit, University Program in Ecology & Nicholas School of the Environment

Faculty instructors: Rebecca Vidra

Course: ENVIRON 102, Fall 2014

Using social network analysis to predict survival in large-brained mammals.

This data expedition introduced students to “sliding windows and persistence” on time series data, which is an algorithm to turn one dimensional time series into a geometric curve in high dimensions, and to quantitatively analyze hybrid geometric/topological properties of the resulting curve such as “loopiness” and “wiggliness.”

Students learned to visualize high-dimensional gene expression data; understand genetic differences in the context of gene networks; connect genetic differences to physiological outcomes; and perform simple analyses using the R programming language.

Introduce NBA and MLB datasets to undergraduates to help them gain expertise in exploratory data analysis, data visualization, statistical inference, and predictive modeling.

Questions asked: Do males and females scent mark equally? Do lemurs scent mark equally in breeding and non-breeding seasons?

STEM education often presents a very sanitized version of the scientific enterprise. To some extent, this is necessary, but overemphasizing neat-and-tidy results and scripted protocol assignments poses the risk of failing to adequately prepare students for the real-world mess of transforming experimental data into meaningful results. The fundamental aim of this project was to guide students in processing large real-world datasets far beyond their academic comfort zone so as to give them a more realistic understanding of how science works.

What drove the prices for paintings in 18th Century Paris?