<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel rdf:about="http://hdl.handle.net/1813/30502">
    <title>eCommons Community:</title>
    <link>http://hdl.handle.net/1813/30502</link>
    <description />
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://hdl.handle.net/1813/33184" />
        <rdf:li rdf:resource="http://hdl.handle.net/1813/30937" />
        <rdf:li rdf:resource="http://hdl.handle.net/1813/30925" />
        <rdf:li rdf:resource="http://hdl.handle.net/1813/30924" />
        <rdf:li rdf:resource="http://hdl.handle.net/1813/30923" />
        <rdf:li rdf:resource="http://hdl.handle.net/1813/30922" />
      </rdf:Seq>
    </items>
    <dc:date>2013-05-08T14:05:44Z</dc:date>
  </channel>
  <item rdf:about="http://hdl.handle.net/1813/33184">
    <title>Estimating identification disclosure risk using mixed membership models</title>
    <link>http://hdl.handle.net/1813/33184</link>
    <description>Title: Estimating identification disclosure risk using mixed membership models
Authors: Manrique-Vallier, Daniel; Reiter, Jerome
Abstract: Statistical agencies and other organizations that disseminate data are obligated to protect data subjects' confi dentiality. For example, ill-intentioned individuals might link data subjects to records in other databases by matching on common characteristics (keys). Successful links are particularly problematic for data subjects with combinations of keys that are unique in the population. Hence, as part of their assessments of disclosure risks, many data stewards estimate the probabilities that sample uniques on sets of discrete keys are also population uniques on those keys. This is typically done using log-linear modeling on the keys. However, log-linear models can yield biased estimates of cell probabilities for sparse contingency tables with many zero counts, which often occurs in databases with many keys. This bias can result in unreliable estimates of probabilities of uniqueness and, hence, misrepresentations of disclosure risks. We propose an alternative to log-linear models for datasets with sparse keys based on a Bayesian version of grade of membership (GoM) models. We present a Bayesian GoM model for multinomial variables and off er an MCMC algorithm for  fitting the model. We evaluate the approach by treating data from a recent US Census Bureau public use microdata sample as a population, taking simple random samples from that population, and benchmarking estimated probabilities of uniqueness against population values. Compared to log-linear models, GoM models provide more accurate estimates of the total number of uniques in the samples. Additionally, they offer record-level predictions of uniqueness that dominate those based on log-linear models.</description>
    <dc:date>2012-01-01T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/1813/30937">
    <title>Revisiting	   the	   Economics	   of	    Privacy:	   Population	   Statistics	   and	    Privacy	   as	   Public	   Goods</title>
    <link>http://hdl.handle.net/1813/30937</link>
    <description>Title: Revisiting	   the	   Economics	   of	    Privacy:	   Population	   Statistics	   and	    Privacy	   as	   Public	   Goods
Authors: Abowd, John
Abstract: Anonymization and data quality are intimately linked. Although this link has been properly &#xD;
acknowledged in the Computer Science and Statistical Disclosure Limitation literatures, economics offers a framework for	&#xD;
 formalizing the linkage and analyzing optimal decisions and equilibrium outcomes.
Description: The opinions expressed in this presentation are those of the author and neither the National Science Foundation nor the Census Bureau.</description>
    <dc:date>2013-01-01T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/1813/30925">
    <title>The NSF-Census Research Network: Cornell Node</title>
    <link>http://hdl.handle.net/1813/30925</link>
    <description>Title: The NSF-Census Research Network: Cornell Node
Authors: Block, William C.; Lagoze, Carl; Vilhuber, Lars; Brown, Warren A.; Williams, Jeremy; Arguillas, Florio
Abstract: Cornell University has received a $3M NSF-Census Research Network (NCRN) award to improve the documentation and discoverability of both public and restricted data from the federal statistical system. The current internal name for this project is the Comprehensive Extensible Data Documentation and Access Repository (CED²AR). The diagram to the right provides a high level architectural overview of the system to be implemented.&#xD;
The CED²AR will be based upon leading metadata standards such as the Data Documentation Initiative (DDI) and Statistical Data and Metadata eXchange (SDMX) and be flexibly designed to ingest documentation from a variety of source files.  It will permit synchronization between the public and confidential instances of the repository. The scholarly community will be able to use the CED²AR as it would a conventional metadata repository, deprived only of the values of certain confidential information, but not their metadata. The authorized user, working on the secure Census Bureau network, could use the CED²AR with full information in authorized domains.</description>
    <dc:date>2012-06-06T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/1813/30924">
    <title>Data Management of Confidential Data</title>
    <link>http://hdl.handle.net/1813/30924</link>
    <description>Title: Data Management of Confidential Data
Authors: Lagoze, Carl; Block, William C.; Williams, Jeremy; Abowd, John M.; Vilhuber, Lars
Abstract: Social science researchers increasingly make use of data that is confidential because it contains linkages to the identities of people, corporations, etc. The value of this data lies in the ability to join the identifiable entities with external data such as genome data, geospatial information, and the like. However, the confidentiality of this data is a barrier to its utility and curation, making it difficult to fulfill US federal data management mandates and interfering with basic scholarly practices such as validation and reuse of existing results. We describe the complexity of the relationships among data that span a public and private divide. We then describe our work on the CED2AR prototype, a first step in providing researchers with a tool that spans this divide and makes it possible for them to search, access, and cite that data.</description>
    <dc:date>2013-01-01T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/1813/30923">
    <title>A Proposed Solution to the Archiving and Curation of Confidential Scientific Inputs</title>
    <link>http://hdl.handle.net/1813/30923</link>
    <description>Title: A Proposed Solution to the Archiving and Curation of Confidential Scientific Inputs
Authors: Abowd, John M.; Vilhuber, Lars; Block, William
Abstract: We develop the core of a method for solving the data archive and curation problem that confronts the custodians of restricted-access research data and the scientific users of such data. Our solution recognizes the dual protections afforded by physical security and access limitation protocols. It is based on extensible tools and can be easily incorporated into existing instructional materials.</description>
    <dc:date>2012-01-01T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/1813/30922">
    <title>An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED2AR)</title>
    <link>http://hdl.handle.net/1813/30922</link>
    <description>Title: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED2AR)
Authors: Block, William C.; Williams, Jeremy; Abowd, John M.; Vilhuber, Lars; Lagoze, Carl
Abstract: This presentation will demonstrate the latest DDI-related technological developments of Cornell University’s $3 million NSF-Census Research Network (NCRN) award, dedicated to improving the documentation, discoverability, and accessibility of public and restricted data from the federal statistical system in the United States. The current internal name for our DDI-based system is the Comprehensive &#xD;
Extensible Data Documentation and Access Repository (CED²AR). CED²AR ingests metadata from heterogeneous sources and supports filtered synchronization between &#xD;
restricted and public metadata holdings. Currently-supported CED²AR “connector workflows” include mechanisms to ingest IPUMS, zero-observation files from the American Community Survey (DDI 2.1), and &#xD;
SIPP Synthetic Beta (DDI 1.2). These disparate metadata sources are all transformed into a DDI 2.5 compliant form and stored in a single repository. In addition, we will demonstrate an extension to DDI &#xD;
2.5 that allows for the labeling of elements within the schema to indicate confidentiality. This metadata &#xD;
can then be filtered, allowing the creation of derived public use metadata from an original confidential source. This repository is currently searchable online through a prototype application demonstrating the &#xD;
ability to search across previously heterogeneous metadata sources.
Description: Presentation at the 4th Annual European DDI User Conference (EDDI12),&#xD;
Norwegian Social Science Data Services, Bergen, Norway, 3 December, 2012</description>
    <dc:date>2012-12-03T00:00:00Z</dc:date>
  </item>
</rdf:RDF>

