The students in this project worked on a pervasive question in literary, film, and copyright studies: how do we know when a new work of fiction borrows from an older one? Many times, works are appropriated, rather than straightforwardly adapted, which makes it difficult for human readers to trace. As we continue to remake and repurpose previous texts into new forms that combine hundreds of references to other works (such as Ready Player One), it becomes increasingly laborious to track all the intertextual elements of a single text. While some borrowings are easy to spot, as in the case of Marvel films that are straightforward adaptations of comic book storylines and aesthetics, others are more subtle, as when Disney reinterpreted Hamlet and African oral traditions to create The Lion King. Thousands of new stories are created each day, but how do we know if we are borrowing or appropriating a previous text? Are there works that have adapted previous ones that we have yet to identify?
The students worked with data from over 16.7 million books from Hathitrust, with critical analysis in scholarly articles accessible through JSTOR, and with the topic categories in Wikipedia. The group used Latent Dirichlet Allocation (LDA), a generative model that assumes that all documents are a mixture of topics, to represent key themes and topics as a distribution over words. The students developed a flexible and graduated heuristic for identifying a work as an adaptation; the more pre-selected categories a work fit under, the more likely it was to be marked as an adaptation by their model. Over the summer, the students came to appreciate that all digital humanistic methodologies are contestable and dependent on traditional critical work.
Click here to read the Executive Summary
Faculty Lead: Grant Glass