Using digitized card catalogs from the David M. Rubenstein Rare Book and Manuscript Library, a team of students will explore extracting structured data from over 115,000 subject cards to develop searchable and sortable descriptions of manuscript and archival collections. They will prepare the digitized subject cards for online access in the Internet Archive, and then use textual analysis tools and natural language processing techniques to create an index of structured metadata for publication in Duke’s Digital Repository. The team will then develop ways to visualize and search this dataset based on different research topics or terms. Ultimately, the Rubenstein Library is building tools that allow users to search, sort, and export collection descriptions from the library’s old card catalogs. The team’s work is a critical piece of a broader initiative within the Rubenstein Library to find and describe historically marginalized voices in our archival collections, particularly those collections documenting BIPOC and indigenous history.
Project Lead: Meghan Lyon