Spatially explicit surface water quality analysis in river networks: linking public water quality data to watersheds and network flowlines

Project Summary

This data expeditions module used three full course sessions to introduce undergraduate hydrology students with minimal programming background to:

  • Public water data (water quantity and chemistry)

  • Spatial analysis of water data

  • 2 core, spatial datasets produced by the USGS that enable spatial analysis

  • The programming language R

  • R based tools for water data

  • Spatial analysis and maps in R

Themes and Categories

Faculty sponsor: Dr. Kateri Salk (,Nicholas School
Graduate Student: Nicholas Bruns (, 3rd year PhD student, river ecology
Course: EOS.323, Landscape Hydrology for undergraduates (instructor: Salk)


Water science now has an extensive, standardized ecosystem for accessing and analyzing water data. The USGS has led this initiative through 4 actions:

  1. Publishing its own high quality public data,
  2. Joining data from other federal, state, local, tribal, and private sector entities, publishing “harmonized” public datasets casted into USGS standards
  3. Developing R interfaces for public data
  4. Developing other R tools for working with its published data

Our module aimed to introduce students to this ecosystem. We also emphasized spatial analysis because many core problems in water science require a spatial perspective. For example, managers aiming to improve water quality in a lake may aim to reduce the nutrients in the river entering that lake. These nutrients entered the waterway somewhere in the upstream watershed, for instance from an outdated septic system. Reducing nutrients entering the lake therefore, replacing that outdated septic system, which in turn requires locating its specific location.

Sessions 1 and 2 worked through RMarkdown files. Rather than using virtual machines, all students installed and developed an R and data environment on their own machine. Course sessions began with overview lecture, proceeded with walking through and running code chunks together, and finished with activities to assess comprehension. We achieved our comprehension assessment by giving students questions that required modification and running of existing code. We found that this "template based", or "cut and paste" based approach was successful, especially on our second session, as it allowed students with minimal previous coding experience to gain exposure to the potential of programming, public data, and programming based spatial analysis. Our intended approach, however, was designed for, and indeed required, in-person circulation by the instructors to trouble-shoot installation differences. Therefore, when our third planned session occurred in the surprise remote conditions of spring 2020, we shifted from using R as planned to using a pre-built tool for interacting with water chemistry data. Specifically, we used a wonderful visualization dashboard (that actually began as an IID, Data+ project):

Our final sessions emphasized the “so what”—what would we want to do with data?


Project summary (PDF)



Session 2 slides (PPT)

Session 3 slides (PPT)

Related Projects

Exposure to local pathogens is a significant selective pressure on the human genome: the strongest selective forces identified in modern human populations are for mutations that confer increased resistance to malaria infection. Understanding how human genetic variation impacts susceptibility to pathogens can reveal important aspects of disease biology and reveal novel treatment targets. By using genome-wide association of infection-related cellular traits, we can connect human genetic variation to disease susceptibility in a controlled laboratory environment. Identification of the variants, genes, and cellular pathways involved in infectious disease pathogenesis can inform host-directed therapeutics, clinically effective risk stratification, and epidemiological prediction. This data expedition explores the effect of host genetic variation on chemokine response to Chlamydia infection.

How does human habitation relate to patterns in the natural environment? How do species respond to the presence of, and changes in, habitation? In this Data Expedition, students make use of public datasets from the Census and the Global Biodiversity Information Facility to examine relationships between individual species and human settlements. Students develop introductory skills in spatial data manipulation and visualization in R, exposure to powerful datasets and tools, and critical thinking skills in assessing dataset quality and bias. 

This data expedition focused on the mechanisms animals use to orient using environmental stimuli, the methods that scientists use to test hypotheses about orientation, and the statistical methods used with circular orientation data. Students collected their own data set during the class period, performed hypothesis testing on their data using circular statistics in R, and aggregated their data to formally test the hypothesis that isopods orient with light using an RShiny online application.

This exercise served as a capstone to a series of four class sessions on orientation and navigation, where students read primary scientific literature that used circular statistics in their methods. This data exercise was used to give students the opportunity to collect their own data, discover why linear statistics wouldn’t be sufficient to analyze them, and then implement their own analysis. The goal of this course was to give students a better understanding of circular statistics, with hands-on application in forming and testing a hypothesis.