Based on data sets that draw from such disparate sources as Medieval poetry, African American novels, recordings of FDR’s fireside chats, and photo-journalistic images, these projects train our students to apply statistical analysis to humanistic questions.
Data Expeditions in the Humanities
This two-week teaching module in an introductory-level undergraduate course invites students to explore the power of Twitter in shaping public discourse. The project supplements the close-reading methods that are central to the humanities with large-scale social media analysis. This exercise challenges students to consider how applying visualization techniques to a dataset too vast for manual apprehension might enable them to identify for granular inspection smaller subsets of data and individual tweets—as well as to determine what factors do not lend themselves to close-reading at all. Employing an original dataset of almost one million tweets focused on the contested 2018 Florida midterm elections, students develop skills in using visualization software, generating research questions, and creating novel visualizations to answer those questions. They then evaluate and compare the affordances of large-scale data analytics with investigation of individual tweets, and draw on their findings to debate the role of social media in shaping public conversations surrounding major national events. This project was developed as a collaboration among the English Department (Emma Davenport and Astrid Giugni), Math Department (Hubert Bray), Duke University Library (Eric Monson), and Trinity Technology Services (Brian Norberg).
Understanding how to generate, analyze, and work with datasets in the humanities is often a difficult task without learning how to code or program. In humanities centered courses, we often privilege close reading or qualitative analysis over other methods of knowing, but by learning some new quantitative techniques we better prepare the students to tackle new forms of reading. This class will work with the data from the HathiTrust to develop ideas for thinking about how large groups and different discourse communities thought of queens of antiquity like Cleopatra and Dido.
Please refer to https://sites.duke.edu/queensofantiquity/ for more information.
This Data Expedition introduces students to network tools and approaches and invites students to consider the relationship(s) between social networks and social imaginaries. Using foundation-funding data that was collected from the The Foundation Directory Online, the Data Expedition enables students to visualize and explore the relationship between networks, social imaginaries, and funding for higher education. The Data Expedition is based on two sets of data. The first set list the grants received by Duke University in 2016 from five foundations: The Bill and Melinda Gates Foundation, Fidelity Charitable Gift Fund, Silicon Valley Community Foundation, The Community Foundation of Western North Carolina, and The Robert Wood Johnson Foundation. The second set lists the names of board members from Duke University and each of these five foundations along with the degree granting institution for their undergraduate education. For the sake of this exercise, the degree granting institutions data was fabricated from a randomized list of the top twenty-five undergraduate institutions.
Data+ in the Humanities
Annie Xu (Rice, CEE), Liuren Yin (ECE), and Zoe Zhu (Data Science) spent ten weeks analysing usage data for MorphoSource, a publicly available 3D data repository maintained by Duke University. Working with Python and Tableau, the team developed an interactive dashboard that allows MorphoSource staff to explore usage patterns for site visitors who view 3D files representing objects from primate skulls to historical art pieces.
View the team's project poster here
Watch the team's final presentation on Zoom here:
Project Leads: Doug Boyer, Julia Winchester
After London was destroyed during the Great Fire of 1666, it was reconstructed into the “emerald gem of Europe,” a utopian epicenter focused on England’s political and economic interests. For whom was the utopia constructed? Who determined its architectural choices? And what did such a utopia look like in seventeenth-century London?
Our research uses Natural Language Processing to analyze semantic trends in digitized text from the online database “Early English Books Online” (EEBO-TCP https://textcreationpartnership.org/tcp-texts/eebo-tcp-early-english-books-online/) to answer such questions. After applying methods such as word-embedding, sentiment analysis, and hapax richness, we provide an overview of themes in the seventeenth century; specifically, we conducted case studies on changes to coal taxes within the period and the reconstruction of St Paul's Cathedral. Our results thus show that, while a utopian society was originally intended to be built for the people, the project’s motivation eventually shifted to a political purpose, as evidenced by the approval of more costly city projects. In response to backlash against the increase of taxes on coal to support large-scale building projects, the ruling class highlighted positive outcomes in printed materials in order to convince working class persons that their collected taxes contributed to a greater good, despite evidence to the contrary. Finally, during key historical events, sentiment and hapax richness are shown to have an inverse relationship, the results of which can demonstrate how London writers engaged with text and genre as forms of protest.
Watch the team's final project presentation on Zoom:
Is there a right type and amount of consumption? The idea of ethical consumption has gained prominence in recent discourse, both in terms of what we purchase (from fair trade coffee to carbon off-sets) and how much we consume (from rechargeable batteries to energy efficient homes).
Heidi Smith (CS, English) and Biniam Garomsa (DataScience, Math) spent ten weeks building tools to assist the David M. Rubenstein Rare Book and Manuscript Library’s mission of finding and describing historically marginalized voices within their collections. The team performed extensive data wrangling, including modern optical character recognition techniques, with the card catalog, and then did a demographic analysis and a topic modeling analysis with the results. Final deliverables to library professionals included a structured dataset, an interactive web app, and a search tool.
Watch the team's final presentation on Zoom:
Project Leads: Meghan Lyon
Project Manager: Anna Holleman
Mapping History has focused on the categorizing, labelling, digitization, and 3D reconstruction of 16th & 17th century maps & atlases of London and Lisbon. Over the course of the summer, the Mapping History team has developed its own unique analytical dataset by painstakingly labelling every element contained within these maps, used python to digitize this dataset, and, now in the projects final stage, has begun the process of reconstructing these historical perspectives in a 3D game engine.
The visibility of hate groups such as the Alt-Right became mainstream into contemporary political culture during the Unite the Right Rally in Charlottesville, VA in 2017. This project aims to explore methods to quantify the presence of Latinxs within the Alt-Right, particularly in how they racialize themselves in a space that often spews hate towards Mexicans and other marginalized groups from Latin America. Using data from multiple sources (such as Twitter, Stormfront, and Breitbart), we developed a corpus of tweets, subthreads, and articles, and analyzed this data using basic natural language processing (NLP) techniques.
We apply word embedding models to corpora from the start of the Early Modern period, when the market economy began to dramatically expand in England. Word embedding models use neural networks to map vectors to words so that semantic relationships are preserved within the vectors’ geometry. Such models have been successful in understanding cultural trends and stereotypes in large corpora of texts, but these techniques are infrequently used on texts dating much farther back than the 19th century. Using newly developed methods for analyzing word embeddings, we track the development of the meanings of words related to consumerism, including their relationships with gender over time.
Led by Dr. Eva Wheeler, this project considers how racial language in African American literature and film is rendered for international audiences and traces the spread of these translations. To address the study’s primary questions, the team analyzed a preliminary dataset and explored the relationship between translation strategy and different categories of racial language. The team also conducted a macro-level analysis of the linguistic, temporal, and geographic spread of African American stories using the IMDB and WorldCat databases. We have found a large amount of variation in how African American stories are rendered, which can in part be explained through a social scientific lens.
The Middle Passage, the route by which most enslaved persons were brought across the Atlantic to North America, is a critical locus of modern history—yet it has been notoriously difficult to document or memorialize. The ultimate aim of this project is to employ the resources of digital mapping technologies as well as the humanistic methods of history, literature, philosophy, and other disciplines to envision how best to memorialize the enslaved persons who lost their lives between their homelands and North America. To do this, the students combined previously-disparate data and archival sources to discover where on their journeys enslaved persons died. Because of the nature of data itself and the history it represents, the team engaged in on-going conversations about various ways of visualizing its findings, and continuously evaluated the ethics of the data’s provenance and their own methodologies and conclusions. A central goal for the students was to discover what contribution digital data analysis methods could make to the project of remembering itself.
The aim of this project was to explore how U.S. mass media—particularly newspapers—enlists text and imagery to portray human rights, genocide, and crimes against humanity from World War II until the present. From the Holocaust to Cambodia, from Rwanda to Myanmar, such representation has political consequences. Coined by Raphael Lemkin, a Polish lawyer who fled Hitler’s antisemitism, the term “genocide” was first introduced to the American public in a Washington Post op-ed in 1944. Since its legal codification by the United Nations Convention for the Prevention of Genocide in 1948, the term has circulated, been debated, used to describes events that pre-date it (such as the displacement and genocide of Native People in the Americas), and been shaped by numerous forces—especially the words and images published in newspapers. Alongside the definition of “genocide,” other key concepts, specifically “crimes against humanity,” have attempted to label, and thus name the story, of targeted mass violence. Conversely, the concept of “human rights,” enshrined in the 1948 UN Declaration, seeks to name a presence of rights instead of their absence.
During the summer, the team focused their work on evaluating the language used in Western media to represent instances of genocide and how such language varied based on the location and time period of the conflict. In particular, the team’s efforts centered on Rwanda and Bosnia as important case studies, affording them the chance to compare nearly simultaneous reporting on two well-known genocides. The language used by reporters in these two cases showed distinct polarizations of terminology (for instance, while “slaughter” was much more common than “murder” in discussions of the Rwanda genocide, the inverse was true for Bosnia).
Faculty Leads: Nora Nunn, Astrid Giugni
How Much Profit is Too Much Profit?
Chris Esposito (Economics), Ruoyu Wu (Computer Science), and Sean Yoon (Masters, Decision Sciences) spent ten weeks building tools to investigate the historical trends of price gouging and excess profits taxes in the United States of America from 1900 to the present. The team used a variety of text-mining methods to create a large database of historical documents, analyzed historical patterns of word use, and created an interactive R Shiny app to display their data and analyses.
(cartoon from The Masses July 1916)
Faculty Lead: Sarah Deutsch
Project Manager: Evan Donahue
The students in this project worked on a pervasive question in literary, film, and copyright studies: how do we know when a new work of fiction borrows from an older one? Many times, works are appropriated, rather than straightforwardly adapted, which makes it difficult for human readers to trace. As we continue to remake and repurpose previous texts into new forms that combine hundreds of references to other works (such as Ready Player One), it becomes increasingly laborious to track all the intertextual elements of a single text. While some borrowings are easy to spot, as in the case of Marvel films that are straightforward adaptations of comic book storylines and aesthetics, others are more subtle, as when Disney reinterpreted Hamlet and African oral traditions to create The Lion King. Thousands of new stories are created each day, but how do we know if we are borrowing or appropriating a previous text? Are there works that have adapted previous ones that we have yet to identify?
Nathan Liang (Psychology, Statistics), Sandra Luksic (Philosophy, Political Science),and Alexis Malone (Statistics) began their 10-week project as an open-ended exploration how women are depicted both physically and figuratively in women's magazines, seeking to consider what role magazines play in the imagined and real lives of women.
In tracing the publication history, geographical spread, and content of “pirated” copies of Daniel Defoe’s Robinson Crusoe, Gabriel Guedes (Math, Global Cultural Studies), Lucian Li (Computer Science, History), and Orgil Batzaya (Math, Computer Science) explored the complications of looking at a data set that saw drastic changes over the last three centuries in terms of spelling and grammar, which offered new challenges to data cleanup. By asking questions of the effectiveness of “distant reading” techniques for comparing thousands of different editions of Robinson Crusoe, the students learned how to think about the appropriateness of myriad computational methods like doc2vec and topic modeling. Through these methods, the students started to ask, at what point does one start seeing patterns that were invisible at a human scale of reading (reading one book at a time)? While the project did not definitively answer these questions, it did provide paths for further inquiry.
The team published their results at: https://orgilbatzaya.github.io/pirating-texts-site/
Ashley Murray (Chemistry/Math), Brian Glucksman (Global Cultural Studies), and Michelle Gao (Statistics/Economics) spent 10 weeks analyzing how meaning and use of the work “poverty” changed in presidential documents from the 1930s to the present. The students found that American presidential rhetoric about poverty has shifted in measurable ways over time. Presidential rhetoric, however, doesn’t necessarily affect policy change. As Michelle Gao explained, “The statistical methods we used provided another more quantitative way of analyzing the text. The database had around 130,000 documents, which is pretty impossible to read one by one and get all the poverty related documents by brute force. As a result, web-scraping and word filtering provided a more efficient and systematic way of extracting all the valuable information while minimizing human errors.” Through techniques such as linear regression, machine learning, and image analysis, the team effectively analyzed large swaths of textual and visual data. This approach allowed them to zero in on significant documents for closer and more in-depth analysis, paying particular attention to documents by presidents such as Franklin Delano Roosevelt or Lyndon B. Johnson, both leaders in what LBJ famously called “The War on Poverty.”
Robbie Ha (Computer Science, Statistics), Peilin Lai (Computer Science, Mathematics), and Alejandro Ortega (Mathematics) spent ten weeks analyzing the content and dissemination of images of the Syrian refugee crisis, as part of a general data-driven investigation of Western photojournalism and how it has contributed to our understanding of this crisis.
Selen Berkman (ECE, CompSci), Sammy Garland (Math), and Aaron VanSteinberg (CompSci, English) spent ten weeks undertaking a data-driven analysis of the representation of women in film and in the film industry, with special attention to a metric called the Bechdel Test. They worked with data from a number of sources, including fivethirtyeight.com and the-numbers.com.
Liuyi Zhu (Computer Science, Math), Gilad Amitai (Masters, Statistics), Raphael Kim (Computer Science, Mechanical Engineering), and Andreas Badea (East Chapel Hill High School) spent ten weeks streamlining and automating the process of electronically rejuvenating medieval artwork. They used a 14th-century altarpiece by Francescussio Ghissi as a working example.
Spenser Easterbrook, a Philosophy and Math double major, joined Biology majors Aharon Walker and Nicholas Branson in a ten-week exploration of the connections between journal publications from the humanities and the sciences. They were guided by Rick Gawne and Jameson Clarke, graduate students from Philosophy and Biology.