Online data scraping has reached a fever pitch, as AI creators seek food for their hungry models. Researchers from the Argus Lab at Duke are building tools to analyze web scraping at scale based on analysis of Duke’s web logs. Data+ students will investigate the time-scale of AI data scraping (e.g. time from scraping to model inclusion) and influence of different scrapers by planting content “honeypots” online. They will also use these to test if synthetic, false, or other low-quality data is filtered from scraped datasets.
Project Lead: Dr. Emily Wenger, ECE
Project Manager: Marcia dos Santos
View the team’s final poster here
Watch the team discuss this project


