Large-scale databases from the social, behavioral, and economic sciences offer enormous potential benefits to society. However, as most stewards of social science data are acutely aware, wide-scale dissemination of such data can result in unintended disclosures of data subjects’ identities and sensitive attributes, thereby violating promises–and in some instances laws to protect data subjects’ privacy and confidentiality.
Supported by a grant from the National Science Foundation Data Infrastructure Building Blocks program, we are developing an integrated system for disseminating large-scale social science data. The system includes:
(i) Capability to generate highly redacted, synthetic data intended for wide access, coupled with
(ii) Means for approved researchers to access the confidential data via secure remote access solutions, glued together by
(iii) A verification server that allows users to assess the quality of their analyses with the redacted data so as to be more efficient with their use of remote data access.