Using digitized card catalogs from the David M. Rubenstein Rare Book and Manuscript Library, a team of students explored extracting structured data from over 115,000 subject cards to develop searchable and sortable descriptions of manuscript and archival collections. They prepared the digitized subject cards for online access in the Internet Archive, and then used textual analysis tools and natural language processing techniques to create an index of structured metadata for publication in Duke’s Digital Repository. The team then developed ways to visualize and search this dataset based on different research topics or terms. Ultimately, the Rubenstein Library is building tools that allow users to search, sort, and export collection descriptions from the library’s old card catalogs. The team’s work is a critical piece of a broader initiative within the Rubenstein Library to find and describe historically marginalized voices in our archival collections, particularly those collections documenting BIPOC and indigenous history.
Project Lead: Meghan Lyon
View the team’s project poster here: team1_poster
View the team’s project video here: