Data & Digital Humanities

The humanities-based projects within iiD couple big data analysis with the interpretive work usually done by humanists.

The data sets for these projects include collections of texts, images, videos, and audio—in other words, they are digital archives broadly understood. From analyzing the numerous editions of Defoe’s Robinson Crusoe to understanding the narrative created by the thousands of photojournalistic depictions of Syrian refugees to virtually restoring medieval art, these groups ask traditional humanistic questions, but explore them with quantitative as well as qualitative analysis.

Discussing a Data+ and Digital Humanities project

Why Data+ for the Humanities?

These humanities projects originate from English, art history, and mathematics faculty and graduate students. The sponsors and mentors direct projects that represent the historical, methodological, and theoretical interests of their own research and teaching areas. But by developing these projects through Data+, they are able to work collaboratively with undergraduate students to meet time consuming technical and computational challenges through skill sets that are often outside the usual humanities repertoire. At the same time, undergraduate students are introduced to humanistic studies outside of the usual classroom setting, learning how to work attentively and closely with archives and conceptual tools for ten weeks over the summer.

Data+ Projects

Nathan Liang (Psychology, Statistics), Sandra Luksic (Philosophy, Political Science),and Alexis Malone (Statistics) spent ten weeks using tools from text and image analytics to understand the evolving representations of women on magazine covers. They worked with a large collection of magazine covers from Duke’s library archive.

Click here to read the Executive Summary

Gabriel Guedes (Math, Global Cultural Studies), Lucian Li (Computer Science, History), and Orgil Batzaya (Math, Computer Science) spent ten weeks using text analytics and interactive mapping tools to understand the geographic spread of 1,482 versions and editions of the Robinson Crusoe story.  They worked with data provided by the Hathi Trust, the University of Florida, and the Internet Archive.

Click here for the Executive Summary

Ashley Murray (Chemistry/Math), Brian Glucksman (Global Cultural Studies), and Michelle Gao (Statistics/Economics) spent ten weeks using sentiment and image analysis to understand semantic shifts in the way American presidents have used the term “poverty,” stretching from the 1930s to the present day.  

In addition to many YouTube videos of presidential speeches, they made use of a large archive of presidential addresses provided by the American Presidency Project.

Click Here for the Executive Summary

Robbie Ha (Computer Science, Statistics), Peilin Lai  (Computer Science, Mathematics), and Alejandro Ortega (Mathematics) spent ten weeks analyzing the content and dissemination of images of the Syrian refugee crisis, as part of a general data-driven investigation of Western photojournalism and how it has contributed to our understanding of this crisis.

Selen Berkman (ECE, CompSci), Sammy Garland (Math), and Aaron VanSteinberg (CompSci, English) spent ten weeks undertaking a data-driven analysis of the representation of women in film and in the film industry, with special attention to a metric called the Bechdel Test. They worked with data from a number of sources, including fivethirtyeight.com and the-numbers.com.

Liuyi Zhu (Computer Science, Math), Gilad Amitai (Masters, Statistics), Raphael Kim (Computer Science, Mechanical Engineering), and Andreas Badea (East Chapel Hill High School) spent ten weeks streamlining and automating the process of electronically rejuvenating medieval artwork. They used a 14th-century altarpiece by Francescussio Ghissi as a working example.

Spenser Easterbrook, a Philosophy and Math double major, joined Biology majors Aharon Walker and Nicholas Branson in a ten-week exploration of the connections between journal publications from the humanities and the sciences. They were guided by Rick Gawne and Jameson Clarke, graduate students from Philosophy and Biology.

Data Expeditions Projects

This data expedition introduced students to “sliding windows and persistence” on time series data, which is an algorithm to turn one dimensional time series into a geometric curve in high dimensions, and to quantitatively analyze hybrid geometric/topological properties of the resulting curve such as “loopiness” and “wiggliness.”

What drove the prices for paintings in 18th Century Paris?