The students worked with data from over 16.7 million books from Hathitrust, with critical analysis in scholarly articles accessible through JSTOR, and with the topic categories in Wikipedia. The group used Latent Dirichlet Allocation (LDA), a generative model that assumes that all documents are a mixture of topics, to represent key themes and topics as a distribution over words. The students developed a flexible and graduated heuristic for identifying a work as an adaptation; the more pre-selected categories a work fit under, the more likely it was to be marked as an adaptation by their model. Over the summer, the students came to appreciate that all digital humanistic methodologies are contestable and dependent on traditional critical work.
Faculty Lead: Grant Glass