Lemurs and Big Data: Learning to Use Big Data in Research

Everybody loves lemurs. Some people love lemurs and data. Thanks to one of Duke’s Information Initiative Data Expeditions projects, a group of Duke evolutionary anthropology students recently learned a lot more about lemurs, and how data can be a powerful research tool.

"Nycticebus pygmaeus 002" by David Haring / Duke Lemur Center - email. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/wiki/File:Nycticebus_pygmaeus_002.jpg#/media/File:Nycticebus_pygmaeus_002.jpg

Data Expeditions projects focus on introducing students to exploratory data analysis. In Fall 2015, PhD candidates Kendra Smyth and Lydia Greene from the Nicholas School led a Data Expeditions workshop in Dr. Leslie Digby’s senior-level Advanced Research in Evolutionary Anthropology class. The goal of the workshop was to get students familiar with the R language and introduce them to a range of statistical techniques that might be useful for analyzing their own senior thesis data.

In the workshop, students used a lemur scent-marking dataset compiled by Greene during her undergraduate honors thesis at Duke.

“By using these data, we aimed to make statistics seem both accessible and relatable to these students,” Greene said.

In a single workshop session, Smyth and Greene showed the group how to use the software and work with the data. Just as importantly, they imparted one of the most important messages of their Data Expedition project: Software programs like R are incredibly powerful at churning out statistics; however, the programs blindly follow the codes and run tests even if the test assumptions are violated.

“This point is incredibly important in statistics, as researchers must be smarter than their programs and fully comprehend the assumptions of their chosen analytical approach,” Smyth said.

Dr. Digby also participated in the workshop and intends to make the lesson a part of her future classes.

“It was great having Lydia and Kendra take over the class for a day,” Dr. Digby said. “Most of my students had already learned some basic aspects of R in their statistics courses, and Lydia and Kendra quickly took advantage of this prior knowledge and gave the students a chance to take things to a higher level, applying their knowledge to data and questions that parallel what they are doing for their senior thesis research.

“In other words, the analyses got very sophisticated very quickly, but the students kept up and did a great job with the mini-project.”

Smyth and Greene also think the scripts and lesson plans could be incorporated into additional Evolutionary Anthropology classes: For example, a statistics laboratory within the introductory class could bring statistics and big data to students just starting out on their science journeys.