Online data scraping has reached a fever pitch, as AI creators seek food for their hungry models. Researchers from the Argus Lab at Duke are building tools to analyze web scraping at scale based on analysis of Duke’s web logs. Data+ students will investigate the time-scale of AI data scraping (e.g. time from scraping to model inclusion) and influence of different scrapers by planting content “honeypots” online. They will also use these to test if synthetic, false, or other low-quality data is filtered from scraped datasets.
Project Lead: Dr. Emily Wenger, ECE