Open Data for Tobacco Retailer Mapping

Project Summary

Felicia Chen (Computer Science, Statistics), Nikkhil Pulimood (Computer Science, Mathematics), and James Wang (Statistics, Public Policy) spent ten weeks working with Counter Tools, a local nonprofit that provides support to over a dozen state health departments. The project goal was to understand how open source data can lead to the creation of a national database of tobacco retailers.

Themes and Categories
Year
2017
Contact
Paul Bendich
Mathematics
bendich@math.duke.edu

Project Results: The team performed a feasibility study involving questions of technical accuracy and cost-effectiveness. Working mostly in R, they used a combination of web-scraping for data collection, machine-learning and text mining for data classification, and MTurk for human validation, and were able to construct a viable dataset for North Carolina.

They presented findings at an informal briefing of civic leaders and planning officials.

Partially funded by Counter Tools

Click here for the Executive Summary

Project Lead & Project ManagerMike Dolan Fliss, Counter Tools

 
 


 

"Coming in, I had little knowledge about what data science research entailed. Participating in Data+ was a great step and helped me better realize my career goals. I learned a host of interdisciplinary skills - ranging from web scraping to survey design – that can definitely be applied to future projects." — Felicia Chen, Computer Science & Public Policy

Related People

Related Projects

This team is part of an ongoing project dedicated to exploring how states and local communities responded to the causes of the 2007-09 Global Financial Crisis. Led by faculty from the Global Financial Markets Center at Duke Law the Data+ team  will conduct analysis of multiple states mortgage enforcement databases to gain a better understanding of how state regulators were, or were not, enforcing existing state law pertaining to mortgages leading up to the crisis. Our website has an example of what this will look like, as last year we analyzed North Carolina’s mortgage enforcement actions and displayed them by topic.

Project Lead: Lee Reiners

Project Manager: Malcolm Smith Fraser

Nationally there is a disproportionate number of children of color (African American & Latino) in the child welfare system. Durham County is no different. However, reviewing this problem through the lens of data has not been done to formulate or implement possible solutions. Durham County Department of Social Services Child & Family Services would like to evaluate systems to identify where and how disproportionality and disparity are occurring. It is occurring at the entry point of Reporting child abuse and neglect? Is it occurring at the case decision? Is our reunification time different for African American children? Or Does it take longer for a child of color to achieve permanence through adoption? Organizing the data to show us our “hot spots” would facilitate further discussion and focus on solutions to an age-old systemic problem.

Faculty Lead: Greg Herschlag

Project Lead: Jovetta L Whitfield

Student teams will develop a benchmark dataset and explore its efficacy in an in house competition where they will put new innovative techniques such as machine learning to the test through a series of challengesA team of students will develop benchmark data pertaining to network performance in the presence of intentional and non-intentional degradation, ranging from sensor failure and additive noise to adversarial interference.  The students will analyze the baseline performance of the network, and measure performance of the degraded network with and without the inclusion of robust techniques that shore up robustness.  Students will have the opportunity to present findings to scientists & engineers from the Air Force Research Laboratory.

Faculty leads: Robert Calderbank, Vahid Tarokh, Ali Pezeshki

Client leads: Dr. Lauren Huie, Dr. Elizabeth Bentley, Dr. Zola Donovan, Dr. Ashley Prater-Bennette, Dr. Erin Trip

Project Manger: Suya Wu