Introduction to Hydrologic Data Analysis Using Scientific Programming Languages

Project Summary

Graduate Students: Kendra Kaiser and John Mallard

Faculty: Michael O’Driscoll

Course: Landscape Hydrology, EOS 323/723

Themes and Categories

Graduate Students: Kendra Kaiser and John Mallard

Faculty: Michael O’Driscoll

Course: Landscape Hydrology, EOS 323/723

The goals of this exercise were twofold: introduce students to scientific programming languages and reinforce hydrological concepts through an assignment that utilized a publically available high-frequency dataset. Although analysis of environmental data is almost always performed using MATLAB, R, or something similar, students often lack opportunities to become comfortable with their use in a setting that provides instructional support. Additionally, although hydrology and other environmental sciences are vitally dependent on high-quality, real-world data, the instruction of these subjects often favors conceptual lessons at the expense of exposure to actual environmental data. Therefore, our assignment is designed to introduce or reinforce scientific programming languages to the students while leveraging a real-world dataset to reinforce concepts learned in class and familiarize students with some of the challenges unique to large, environmental datasets. The students acquired data, imported it into Matlab or R, performed analyses on it, and exported the modified dataset to a common format to be shared along with their script.

Data Expedition Assignment Objectives

1. Identify and acquire publicly available data

2. Demonstrate increased proficiency with scientific programming languages

  • Import data into MATLAB or R
  • Perform analyses on dataset: manipulate, preform calculations, and plot
  • Export the modified dataset to a common format

3. Reinforce conceptual understanding of water and energy fluxes through interpretation of     real-world data

  • Calculate components of the energy balance and discuss temporal dynamics with respect to site specific environmental conditions
  • Calculate evapotranspiration and potential evapotranspiration using common methods and discuss assumptions associated with each

Data Source

The AmeriFlux network measures carbon, water, and energy flux at the ecosystem level across North and South America. These measurements are used to build an understanding of fluxes of energy, water, and nutrients from ecosystems across the western hemisphere and to evaluate ecosystem response to landuse and climate change. This project is funded by the US Department of Energy to encourage consistent measurements and long-term monitoring of these ecosystem fluxes. Data are collected at individual research sites and uploaded to a central Ameriflux server, from which researchers and educators can download individual data records. The network consists of 110 active research sites, of which 44 are designated “core” sites, which maintain specific and high standards of data collection.

Concluding Remarks

We assessed students’ experience with programming (in any language) using a pre-assignment survey. We found that half of the students had no experience with coding, and of those who did have experience only two rated themselves as proficient. We provided two lab sessions where we introduced students to the programming platforms and were on hand to answer specific questions related to the assignment, and then extended office hours to help along the way. In doing so, we found that no students were unable to get the help they needed on the assignment. In a post-assignment questionnaire we found that over half of the class rated themselves as having moderate experience with coding (over 3 on a scale from 1-5), and none of the students rated themselves lower than 2.

However, this method of intensive instructional support required significant time commitments from both instructors. In the future, requiring students to watch tutorials online, and to gradually introduce them to the programs throughout the semester would help get students past the initial learning curve. We found that students did a good job of working through coding issues and questions in small groups. Creating opportunities for them to do this (e.g., booking a computer lab for them, assigning or encouraging formalized groups, etc) would be beneficial.


Related Projects

KC and Patrick led two hands-on data workshops for ENVIRON 335: Drones in Marine Biology, Ecology, and Conservation. These labs were intended to introduce students to examples of how drones are currently being used as a remote sensing tool to monitor marine megafauna and their environments, and how machine learning can be used to efficiently analyze remote sensing datasets. The first lab specifically focused on how drones are being used to collect aerial images of whales to measure changes in body condition to help monitor populations. Students were introduced to the methods for making accurate measurements and then received an opportunity to measure whales themselves. The second lab then introduced analysis methods using computer vision and deep neural networks to detect, count, and measure objects of interest in remote sensing data. This work provided students in the environmental sciences an introduction to new techniques in machine learning and remote sensing that can be powerful multipliers of effort when analyzing large environmental datasets.

This two-week teaching module in an introductory-level undergraduate course invites students to explore the power of Twitter in shaping public discourse. The project supplements the close-reading methods that are central to the humanities with large-scale social media analysis. This exercise challenges students to consider how applying visualization techniques to a dataset too vast for manual apprehension might enable them to identify for granular inspection smaller subsets of data and individual tweets—as well as to determine what factors do not lend themselves to close-reading at all. Employing an original dataset of almost one million tweets focused on the contested 2018 Florida midterm elections, students develop skills in using visualization software, generating research questions, and creating novel visualizations to answer those questions. They then evaluate and compare the affordances of large-scale data analytics with investigation of individual tweets, and draw on their findings to debate the role of social media in shaping public conversations surrounding major national events. This project was developed as a collaboration among the English Department (Emma Davenport and Astrid Giugni), Math Department (Hubert Bray), Duke University Library (Eric Monson), and Trinity Technology Services (Brian Norberg).

Understanding how to generate, analyze, and work with datasets in the humanities is often a difficult task without learning how to code or program. In humanities centered courses, we often privilege close reading or qualitative analysis over other methods of knowing, but by learning some new quantitative techniques we better prepare the students to tackle new forms of reading. This class will work with the data from the HathiTrust to develop ideas for thinking about how large groups and different discourse communities thought of queens of antiquity like Cleopatra and Dido.

Please refer to for more information.