AI-powered Transcription of Historical Manuscripts

AI-powered Transcription of Historical Manuscripts

2022

A team of students will collaborate with Duke librarians to use AI-powered Handwriting Text Recognition (HTR) tools to transform thousands of pages of handwritten text into machine readable data. Using a large dataset of digitized 19th and early 20th-century women’s travel diaries held in the Rubenstein Library students will test and evaluate various HTR technologies, document methods and constraints for extracting text from historical manuscripts, and build an HTR toolset and proof-of-concept interface that the library can build on for future projects. This work will further the library’s initiative to make its historical collections more readily available for computational research (and future Data+ projects!).

Project Leads: Molly Bragg and Noah Huffman
Project Manager: Anna Holleman

View the team’s final poster

Watch the team’s final presentation below:

Contact

Mathematics

Related People

Data Science

Computer Science, Creative Writing

Digital Art History

Electrical & Computer Engineering

Computer Science, Statistics

Senior Data Curator in the Office of Faculty Data Systems and Analysis

Duke Libraries, Head of Digital Collections and Curation Services

Duke Center for Health Policy and Inequalities Research