NBA and MLB datasets

Project Summary

Introduce NBA and MLB datasets to undergraduates to help them gain expertise in exploratory data analysis, data visualization, statistical inference, and predictive modeling.

Themes and Categories
Paul Bendich

Graduate students: Joe Futoma and Ken McAlinn, PhD students, Statistical Science

Faculty instructor: Mine Cetinkaya-Rundel

Course: STA 112 (Data Science)


  • Assessing home field advantage
  • Determining long term trends
  • Predicting game outcomes

Related Projects

This data expeditions module used three full course sessions to introduce undergraduate hydrology students with minimal programming background to:

  • Public water data (water quantity and chemistry)

  • Spatial analysis of water data

  • 2 core, spatial datasets produced by the USGS that enable spatial analysis

  • The programming language R

  • R based tools for water data

  • Spatial analysis and maps in R

Exposure to local pathogens is a significant selective pressure on the human genome: the strongest selective forces identified in modern human populations are for mutations that confer increased resistance to malaria infection. Understanding how human genetic variation impacts susceptibility to pathogens can reveal important aspects of disease biology and reveal novel treatment targets. By using genome-wide association of infection-related cellular traits, we can connect human genetic variation to disease susceptibility in a controlled laboratory environment. Identification of the variants, genes, and cellular pathways involved in infectious disease pathogenesis can inform host-directed therapeutics, clinically effective risk stratification, and epidemiological prediction. This data expedition explores the effect of host genetic variation on chemokine response to Chlamydia infection.

How does human habitation relate to patterns in the natural environment? How do species respond to the presence of, and changes in, habitation? In this Data Expedition, students make use of public datasets from the Census and the Global Biodiversity Information Facility to examine relationships between individual species and human settlements. Students develop introductory skills in spatial data manipulation and visualization in R, exposure to powerful datasets and tools, and critical thinking skills in assessing dataset quality and bias.