Pirating Texts

Project Summary

In tracing the publication history, geographical spread, and content of “pirated” copies of Daniel Defoe’s Robinson Crusoe, Gabriel Guedes (Math, Global Cultural Studies), Lucian Li (Computer Science, History), and Orgil Batzaya (Math, Computer Science) explored the complications of looking at a data set that saw drastic changes over the last three centuries in terms of spelling and grammar, which offered new challenges to data cleanup. By asking questions of the effectiveness of “distant reading” techniques for comparing thousands of different editions of Robinson Crusoe, the students learned how to think about the appropriateness of myriad computational methods like doc2vec and topic modeling. Through these methods, the students started to ask, at what point does one start seeing patterns that were invisible at a human scale of reading (reading one book at a time)? While the project did not definitively answer these questions, it did provide paths for further inquiry.

The team published their results at: https://orgilbatzaya.github.io/pirating-texts-site/

Click here for the Executive Summary

Themes and Categories
Year
2018
Contact
Paul Bendich
Mathematics
bendich@math.duke.edu

Disciplines Involved: English, Literature, History, Geography, Visual & Media Studies

Project Lead: Charlotte Sussman

Project Manager: Grant Glass

This project aimed at further exploring how to better develop different methods for doing humanities based research by combining the open-ended nature of humanities projects with the methodological rigor of fields like statistics and computer science. Lucuan Li noticed the potential for finding new ways to link these methods to the humanities: “The open-endedness gave us tremendous freedom to determine our modes of analysis and which parts of the data we would use.” Orgil Batzaya found drawing links between data insights and historical facts compelling: “We looked at distributions of the concentration of publication in different countries and it was fun trying to link historical periods to peaks and troughs in publication.” Some of these links became profoundly obvious according to Gabe Guedes: “As for the final outcome, I was surprised to be able to see such a strong correlation between historical events and publication volume, to the point where you had very noticeable peaks when countries made substantial imperial forays.”

The team was directed and mentored by Grant Glass, a graduate student in the English Department at UNC-CH. Grant’s own research focuses on the question, what is a text? This project allowed Grant to begin to form the data structure for creating a new edition of Robinson Crusoe by understanding how thousands of copies are related to one another. The experience and insights took Grant by surprise: “I did not think that there was as much variance between the copies as there was. This new understanding of the text will help me describe how reading publics, publishers, and editors shape the text long after the author is gone.”

Related People

Related Projects

A team of students collaborating with Duke School of Medicine's Root Causes Fresh Produce Program, community members, and physicians throughout the Duke Health network will help integrate data from food deliveries to Duke Health patients with patient health record data and other available data sources to create a dashboard that can analyze, predict, and manage the Root Causes' "Food as Medicine" program. Specific outcomes will contribute to improving the Program's quantitative evaluation of its health impact as well as efficiency and satisfaction for its patients. Students will be assisted with IRB approval and mentorship from faculty and community advisors.

Project Leads: Esko Brummel, Willis Wong

A team of students led by researchers at the Duke Marine Lab will explore the changing distribution of krill around the Antarctic Peninsula. Krill are a key prey species in this ecosystem, supporting a number of animals including whales, seals, and penguins, but they are dependent on winter sea ice and may be in trouble as climate change progresses. Using data from acoustic zooplankton surveys, students will create maps and other products to visualize the spatial distribution of krill over the past 20 summers, then create metrics that allow us to quantify the way that krill distribution around the Antarctic Peninsula is changing as the climate shifts and ice melts. These results will be key to our understanding of the impacts of climate change on this polar ecosystem.

 

Project Lead: Douglas Nowacek

Project Manager: Amanda Lohmann

 

A team of students will partner closely with the City of Durham's newly formed Community Safety Department.  The Community Safety Department's mission is to identify, implement, and evaluate new approaches to enhance public safety that may not involve a law enforcement response or the criminal justice system. The student team will (1) analyze and identify geographic and temporal patterns in 911 calls for service, (2) conceptualize and build an abstracted data pipeline and tools that would enrich currently available 911 data with other social, economic, and health-related data, (3) explore associations between areas of high call volume, indicators of mental health distress, and histories of dispossession; and (4) identify methods by which future researchers could examine connections between varied 911 incident responses (e.g. police response, unarmed response, joint police, and mental health response) and life trajectories (e.g. arrest, jail time, hospitalization, unemployment, etc.).

 

Project Lead: Greg Herschlag, Anise Van, City of Durham