An Integrated System for Accessing Large-Scale, Confidential Social Science Data

Project Summary

Large-scale databases from the social, behavioral, and economic sciences offer enormous potential benefits to society. However, as most stewards of social science data are acutely aware, wide-scale dissemination of such data can result in unintended disclosures of data subjects' identities and sensitive attributes, thereby violating promises–and in some instances laws to protect data subjects' privacy and confidentiality. 

Themes and Categories
Jerry Reiter
Statistical Science

Supported by a grant from the National Science Foundation Data Infrastructure Building Blocks program, we are developing an integrated system for disseminating large-scale social science data. The system includes:

(i) Capability to generate highly redacted, synthetic data intended for wide access, coupled with

(ii) Means for approved researchers to access the confidential data via secure remote access solutions, glued together by

(iii) A verification server that allows users to assess the quality of their analyses with the redacted data so as to be more efficient with their use of remote data access.

Related People

Related Projects

Robbie Ha (Computer Science, Statistics), Peilin Lai  (Computer Science, Mathematics), and Alejandro Ortega (Mathematics) spent ten weeks analyzing the content and dissemination of images of the Syrian refugee crisis, as part of a general data-driven investigation of Western photojournalism and how it has contributed to our understanding of this crisis.

Building off the work of a 2016 Data+ teamYu Chen (Economics), Peter Hase (Statistics), and Ziwei Zhao (Mathematics), spent ten weeks working closely with analytical leadership at Duke's Office of University Development. The project goal was to identify distinguishing characteristics of major alumni donors and to model their lifetime giving behavior.

Over ten weeks, Computer Science Majors Amber Strange and Jackson Dellinger joined forces with Psychology major Rachel Buchanan to perform a data-driven analysis of mental health intervention practices by Durham Police Department. They worked closely with leadership from the Durham Crisis Intervention Team (CIT) Collaborative, made up of officers who have completed 40 hours of specialized training in mental illness and crisis intervention techniques.