Poverty in Writing & Images

Project Summary

Ashley Murray (Chemistry/Math), Brian Glucksman (Global Cultural Studies), and Michelle Gao (Statistics/Economics) spent 10 weeks analyzing how meaning and use of the work “poverty” changed in presidential documents from the 1930s to the present. The students found that American presidential rhetoric about poverty has shifted in measurable ways over time. Presidential rhetoric, however, doesn’t necessarily affect policy change. As Michelle Gao explained, “The statistical methods we used provided another more quantitative way of analyzing the text. The database had around 130,000 documents, which is pretty impossible to read one by one and get all the poverty related documents by brute force. As a result, web-scraping and word filtering provided a more efficient and systematic way of extracting all the valuable information while minimizing human errors.” Through techniques such as linear regression, machine learning, and image analysis, the team effectively analyzed large swaths of textual and visual data. This approach allowed them to zero in on significant documents for closer and more in-depth analysis, paying particular attention to documents by presidents such as Franklin Delano Roosevelt or Lyndon B. Johnson, both leaders in what LBJ famously called “The War on Poverty.”

Click Here for the Executive Summary

Themes and Categories
Year
2018
Contact
Paul Bendich
Mathematics
bendich@math.duke.edu

Disciplines Involved: English, Literature, History, Public Policy, Political Science, all Quantitative STEM

Project Lead: Astrid Giugni

Project Manager: Nora Nunn

The documents for analysis were provided by the American Presidency Project: http://www.presidency.ucsb.edu/index.php.

In addition, this project aimed at further exploring how to better develop the link between data analysis and humanistic studies. Unlike many traditional STEM projects, the open-ended nature of this humanities project freed the students to take intellectual risks and venture into uncharted territory. Brian Glucksman found this to be an important part of the experience: “The main benefit that I felt about the open-endedness of the project was that it felt like it was impossible to fail. We had the opportunity to define the exact scope of our project, so we could never fall short of anything. It was even a little bit liberating to realize we could not do all the work that could be done from the American Presidency Project.”  

Mentored by Nora Nunn, a graduate student in the English Department with no previous computational experience, the group paid close attention to narrative and storytelling over the summer. Nora’s own research is deeply grounded in political and ethical considerations, focusing on genocide in 20th-century transnational American literature and visual cultures. This project prompted her to take a fresh look at her own work: “My experience with Data+ showed me that the humanities and data science can at times form a symbiotic relationship. In fact, in light of this realization, I now view my own research—about the life of another word with political implications (genocide)—through a different lens. How do images and language connect or disconnect? And what are the political and social implications of these findings? In the case of Poverty in Writing and Images, social issues were inextricably intertwined with statistical ones. The symbiosis of algorithms and policy, social justice and big data, humanism and STEM left me with more questions than answers. For that experience, I am grateful.” Nora’s mentorship guided the students to make some of the same connections, prompting Ashley Murray to argue that the “usefulness of an algorithm is measured by how it can actually help/aid the humans utilizing it. This project’s aim was to look at social issues, which is inherently a way of helping other humans, and we are just using algorithms to do so.”

Related People

Related Projects

Producing oil and gas in the North Sea, off the coast of the United Kingdom, requires a lease to extract resources from beneath the ocean floor and companies bid for those rights. This team will work with ExxonMobil to understand why these leases are acquired and who benefits. This requires historical data on bid history to investigate what leads to an increase in the number of (a) leases acquired and (b) companies participating in auctions. The goal of this team is to create a well-structured dataset based on company bid history from the U.K. Oil and Gas Authority; data which will come from many different file structures and formats (tabular, pdf, etc.). The team will curate these data to create a single, tabular database of U.K. bid history and work programs.

Producing oil and gas in the Gulf of Mexico requires rights to extract these resources from beneath the ocean floor and companies bid into the market for those rights. The tops bids are sometimes significantly larger than the next highest bids, but it’s not always clear why this differential exists and some companies seemingly overbid by large margins. This team will work with ExxonMobil to curate and analyze historical bid data from the Bureau of Ocean Energy Management that contains information on company bid history, infrastructure, wells, and seismic survey data as well as data from the companies themselves and geopolitical events. The stretch goal of the team will be to see if they can uncover the rationale behind historic bidding patterns. What do the highest bidders know that other bidders to not (if anything)? What characteristics might incentivize overbidding to minimize the risk of losing the right to produce (i.e. ambiguity aversion)?

In this project, we are interested in creating a cohesive data pipeline for generating, modeling and visualizing basketball data. In particular, we are interested in understanding how to extract data from freely available video, how to model such data to capture player efficiency, strength and leadership, and how to visualize such data outcomes. We will have four separate teams as part of this project working on interrelated but separate goals:

Team 1: Video data extraction

This team will explore different video data extraction techniques with the goal of identifying player locations, ball location and events at any given time during a basketball game. The software developed as part of this project will be able to generate a usable dataset of time-stamped basketball plays that can be used to model the game of basketball.

Teams 2 & 3: Modeling basketball data: offense and defense

The two teams will explore different models for the game of basketball. The first team will concentrate on modeling offensive plays and try to answer questions such as: How does the ball advance? What leads to successful plays? The second team will concentrate on defensive plays: What is an optimal strategy for minimizing opponent scoring opportunities? How should we evaluate defensive plays?

Team 4: Visualizing basketball data

This team will work on dynamic and static visualization of elements of a basketball game. The goal of the visualization is to capture information about how players and the ball move around the court. They will develop tools to represent average trajectories be in these settings that can also capture uncertainty about this information.

Faculty Leads: Alexander Volfovsky, James Moody, Katherine Heller

Project Managers: Fan Bu, Greg Spell, 2 more TBD