Open Data for Tobacco Retailer Mapping

Project Summary

Felicia Chen (Computer Science, Statistics), Nikkhil Pulimood (Computer Science, Mathematics), and James Wang (Statistics, Public Policy) spent ten weeks working with Counter Tools, a local nonprofit that provides support to over a dozen state health departments. The project goal was to understand how open source data can lead to the creation of a national database of tobacco retailers.

Themes and Categories
Contact
Paul Bendich
Mathematics
bendich@math.duke.edu

Project Results: The team performed a feasibility study involving questions of technical accuracy and cost-effectiveness. Working mostly in R, they used a combination of web-scraping for data collection, machine-learning and text mining for data classification, and MTurk for human validation, and were able to construct a viable dataset for North Carolina.

They presented findings at an informal briefing of civic leaders and planning officials.

Partially funded by Counter Tools

Click here for the Executive Summary

Project Lead & Project ManagerMike Dolan Fliss, Counter Tools

 
 

 

"Coming in, I had little knowledge about what data science research entailed. Participating in Data+ was a great step and helped me better realize my career goals. I learned a host of interdisciplinary skills - ranging from web scraping to survey design – that can definitely be applied to future projects." — Felicia Chen, Computer Science & Public Policy

Related People

Related Projects

United Nations Sustainable Development Goal 7 calls for universal access to affordable, reliable, sustainable, and modern energy. Researchers and practitioners around the world have responded to this call by producing a wealth of energy access data. While many data gaps still exist, are we capturing the fullest potential from the information and research we do have, and what it tells us about how to accelerate energy access? Power for All’s Platform for Energy Access Knowledge (PEAK) is an interactive knowledge platform designed to automatically curate, organize, and streamline large, growing bodies of data into digestible, sharable, and useable knowledge through automated data capture, indexing, and visualization. A team of students led by Rebekah Shirley will consult with Power for All to creatively visualize PEAK’s library, and to explore machine learning and natural language processing tools that can enable auto-extraction and visualization of data for more effective science communication.

Are there relative value opportunities in the global corporate bond markets?  
A team of students will work with Professor Emma Rasiel to understand whether an analysis of credit spreads on bonds issued by international firms in multiple countries over time can shed light on potential arbitrage opportunities. The team will have frequent opportunities to interact with analytics professionals at a leading financial advisory and asset management firm.

 

A team of students will consult with a leading financial advisory and asset management firm that is seeking to understand how big data can shed light on the secondary market for construction machinery. Students will explore a combination of publicly-available datasets that describe the used-machinery market and its potential implications as an indicator for the business cycle. There will be frequent interactions with analytical professionals from the firm.