An Integrated System for Accessing Large-Scale, Confidential Social Science Data

Project Summary

Large-scale databases from the social, behavioral, and economic sciences offer enormous potential benefits to society. However, as most stewards of social science data are acutely aware, wide-scale dissemination of such data can result in unintended disclosures of data subjects' identities and sensitive attributes, thereby violating promises–and in some instances laws to protect data subjects' privacy and confidentiality. 

Themes and Categories
Year
Contact
Jerry Reiter
Statistical Science
jerry@stat.duke.edu

Supported by a grant from the National Science Foundation Data Infrastructure Building Blocks program, we are developing an integrated system for disseminating large-scale social science data. The system includes:

(i) Capability to generate highly redacted, synthetic data intended for wide access, coupled with

(ii) Means for approved researchers to access the confidential data via secure remote access solutions, glued together by

(iii) A verification server that allows users to assess the quality of their analyses with the redacted data so as to be more efficient with their use of remote data access.

Related People

Related Projects

A large and growing trove of patient, clinical, and organizational data is collected as a part of the “Help Desk” program at Durham’s Lincoln Community Health Center. Help Desk is a group of student volunteers who connect with patients over the phone and help them navigate to community resources (like food assistance programs, legal aid, or employment centers). Data-driven approaches to identifying service gaps, understanding the patient population, and uncovering unseen trends are important for improving patient health and advocating for the necessity of these resources. Disparities in food security, economic stability, education, neighborhood and physical environment, community and social context, and access to the healthcare system are crucial social determinants of health, which studies indicate account for nearly 70% of all health outcomes.

Our team examined the relationship between race and home values across several units of analysis (household, address, HOLC rating area, census block, block group, and tract) in Durham, NC. We combined data from the decennial censuses (1940-2010), American Community Survey (2005-2018), Durham County Register of Deeds (1997-2020), and Durham County Tax Administration (1997-2021). We find that home values are strongly associated with the racial composition of areas, that homes in black neighborhoods are worth less, and that they accumulate less value over time.

Project Leads: William Darity Jr.

Project Manager: Omer Ali

Click here to view the team's final project slides

 

Watch the team's final presentation (on Zoom) here:

This summer, our objective was to take data provided by the Durham County Detention Facility (DCDF), Duke Health, and Lincoln Community Health Center and analyze trends across the local justice system and these health care institutions, specifically in regards to individuals with mental illness. We analyzed the experience of individuals who were incarcerated by looking at their demographic characteristics, emergency department usage, and criminal justice encounters. Using these initial findings, we hope to better understand the relationship between health care utilization and rates of recidivism in Durham County during the school year through a Bass Connections Team.

Project Leads: Nicole Schramm-Sapyta, Maria Tackett

Project Manager: Ruth Wygle

 

Click here to view the team's final project summary

 

Watch the team's final presentation (on Zoom) here: