Building honeypots to track AI web scrapers

Building honeypots to track AI web scrapers

2025

Online data scraping has reached a fever pitch, as AI creators seek food for their hungry models. Researchers from the Argus Lab at Duke are building tools to analyze web scraping at scale based on analysis of Duke’s web logs. Data+ students will investigate the time-scale of AI data scraping (e.g. time from scraping to model inclusion) and influence of different scrapers by planting content “honeypots” online. They will also use these to test if synthetic, false, or other low-quality data is filtered from scraped datasets.

Project Lead: Dr. Emily Wenger, ECE

Project Manager: Marcia dos Santos

View the team’s final poster here

Data+ Honeypots Poster (1)

Watch the team discuss this project

Related People

Assistant Director of Student Research, Data+ Program Director

Mathematics

Related News

Durham, NC — July 7, 2025 — As artificial intelligence tools grow more powerful, they’re also growing hungrier—for data. A team of Data+ students at Duke University is stepping in to...