Lab 12:  Synthetic Economic Data

Create some fully synthetic data from the 2002 Economic Census frame.

The input data are in econcen2002.sas7bdat.

Variable list
NAICS_2002: full NAICS code (2002 standard)
NAICS_Order: used to sort the data ascending by full NAICS code
NAICS_Level: 2=sector; 3=sub-sector; 4=industry group; 5=NAICS North American industry; 6=NAICS US industry; 7=NAICS sub industry
NAICS_Definition: text definition
Establishments: number of establishments
Sales_Receipts: Sales, Receipts or other volume measure $1,000
Annual_Payroll: Total payroll $1,000
Employment: Number of employees

1. The input data are public use data from the official Census releases for the 2002 Economic Census. At this point you do not have access to any within-NAICS summary data. Analytically valid synthetic data should reproduce total sales, annual payroll, employment, sales per employee, payroll per employee, and payroll per dollar of sales.

2. Pick a NAICS industry group. If its aggregate data do not exist, create them by summing over the appropriate NAICS North American and/or NAICS US Industry data. Make modeling assumptions about the joint distribution of total sales, annual payroll and employment for establishments in your NAICS industry group. You may use other sources if you wish or just use the data in the more detailed NAICS codes. Synthesize 5 establishment populations.

3. Prepare simple 5% random samples of each of your synthetic populations. Demonstrate the analytic validity of the estimands: total sales, annual payroll, employment, sales per employee, payroll per employee, and payroll per dollar of sales.