Overview:Felicia Chen (Computer Science, Statistics), Nikkhil Pulimood (Computer Science, Mathematics), and James Wang (Statistics, Public Policy) spent ten weeks working with Counter Tools, a local nonprofit that provides support to over a dozen state health departments. The project goal was to understand how open source data can lead to the creation of a national database of tobacco retailers.
Project Results: The team performed a feasibility study involving questions of technical accuracy and cost-effectiveness. Working mostly in R, they used a combination of web-scraping for data collection, machine-learning and text mining for data classification, and MTurk for human validation, and were able to construct a viable dataset for North Carolina.
They presented findings at an informal briefing of civic leaders and planning officials.
"Coming in, I had little knowledge about what data science research entailed. Participating in Data+ was a great step and helped me better realize my career goals. I learned a host of interdisciplinary skills - ranging from web scraping to survey design – that can definitely be applied to future projects." — Felicia Chen, Computer Science & Public Policy