Data Analysis for Ecological Modeling

Project Summary

Ecological data comes in various shapes and sizes. When conducting an ecological study, it is common to have population data (such as snail counts) and continuous sensor data (such as stream temperature with 35,000 data points collected each year!). Ecologists must reconcile data collected at different spatial and temporal scales in order to make inferences about their study systems. Luckily, there are standard practices and toolsets that ecologists use. In this data expedition, we ingest, arrange and query data collected in the field through various methods into formats that can be analyzed. We then use different plot types, data transformations and statistical tests, such that our analyses are appropriate for the type of data. We examine both field data collected by students and also large open-source datasets that can be scraped from the web and analyzed locally.


Each year, the Field ecology students measure physical, chemical, and biological characteristics of the Eno River. The Eno River also has been continuously monitored for numerous environmental parameters as part of the StreamPulse project (Duke and other collaborators worldwide). StreamPulse collects data from instream sensors, such as temperature and dissolved oxygen to estimate ecosystem processes such as metabolism. So, we are able to compare data collected in the field course to long term monitoring efforts.

Themes and Categories

Graduate Students: Emily Ury and Alice Carter

Faculty: Dr. Justin Wright (and Dr. Emily Bernhardt helped with original proposal)

Undergraduate Course: "Field Ecology" (BIO 361) 


Part 1: Observations at the Eno River

  • Students will learn how to ingest their own data into the R programming environment

  • Students will become familiar with different types of ecological data

  • Students will use linear regression and multiple linear regression to examine and predict ecological data

  • Students will try different transformations and statistical tests to examine their data

Part 2: A Year of Eno River Data

  • Students will explore the StreamPulse project data platform and R package

  • Students will download and examine a year of Eno River monitoring data

  • Students will begin to examine how long-term monitoring data is used to understand field observation data for ecological analysis. 

Here are some examples of the plots we made:

Binning stream parameters to understand population distributions:

Snail distribution graph

Trying out various visual, statistical and modeling approaches:

Different modeling approaches graph

Graph of a year of oxygen levels at the Eno River

Student Feedback

“I’ve never used R before, so I learned how to input data, make plots, and do regression analyses (single + multiple)...Stream data was really cool!”

“Thank you! Super helpful. Always so much to learn with R.”

“[I] learned how to fit a linear trendline to a graph.”

“I learned how to customize the data I am working with.”

Student feedback cards

Attached materials for the lesson

Two R markdown files:

One data file:

Photos from the class

Students in class

Students in class

Students in class

Related Projects

The goal of this Data Expedition was to introduce students to the exploration of social networks data using R. Students learned to load and plot a social network in R and then perform some basic analyses on two different networks: Hockey Fights in the National Hockey League in 2018-2019 and characters in Game of Thrones Season 3. Students used social network analysis to better understand who is connected to whom, how frequently they interact, and how they are interacting.

The data that students see in their statistics courses are often constrained to numeric and tabular data. However, there is an exciting field of data science and statistics known as text analysis. This expedition introduces students to the concept of treating text as data frames of words, and demonstrates how to perform basic analyses on bodies of text using R. Tweets of four Democratic candidates for the 2020 Primary are used as data, and demonstrated text analysis techniques in the expedition include comparisons of word frequencies, log-odds ratios for word usage, and pairwise word correlations.

Fluid mechanics is the study of how fluids (e.g., air, water) move and the forces on them. Scientists and engineers have developed mathematical equations to model the motions of fluid and inertial particles. However, these equations are often computationally expensive, meaning they take a long time for the computer to solve. 

To reduce the computation time, we can use machine learning techniques to develop statistical models of fluid behavior. Statistical models do not actually represent the physics of fluids; rather, they learn trends and relationships from the results of previous simulations. Statistical models allow us to leverage the findings of long, expensive simulations to obtain results in a fraction of the time.

In this project, we provide students with the results of direct numerical simulations (DNS), which took many weeks for the computer to solve. We ask students to use machine learning techniques to develop statistical models of the results of the DNS.