The Government Just Banned Math (only mildly hyperbolic)

The Government Just Banned Math (only mildly hyperbolic)
404 error page on the Census Bureau's website

Getting people to care about statistical methodologies is a challenge. Hence the hyperbolic title, but give us a few minutes because what is happening matters for you and your privacy.

Last night, Hansi Lo Wang, a correspondent at NPR, reported that multiple pages that mention “differential privacy” or “noise infusion” have been removed from the Census Bureau’s website. This is in response to a new policy issued by the Department of Commerce on June 4. This policy effectively bans the use of noise infusion in favor of coarsening methods to avoid disclosure in statistical products produced by the Census and the Bureau of Economic Analysis. 

While all of this might sound very … niche … the effect of the policy change is quite concerning. The team at DataIndex published a detailed explanation of noise infusion and differential privacy, as well as the potential impacts of this new policy. Most notably, differential privacy is a form of noise infusion used in the 2020 Decennial Census. While the method has garnered some debate and controversy, the goal is ultimately to protect privacy by limiting how much data about an individual can be reconstructed. John Abowd, the former Associate Director for Research and Methodology and Chief Scientist at the Census Bureau, recently listed the many products that use noise infusion on a public LinkedIn post

Noise Infusion Banned by Department of Commerce: Affected Data Products | John Abowd posted on the topic | LinkedIn
I’ve been asked these questions so many times over the past week that I’m going to post the answers here. What is noise infusion (context: statistical disclosure limitation)? Noise infusion is any statistical disclosure limitation mechanism where the published statistic differs from the same statistic calculated from the original confidential data due to randomness deliberately introduced into the calculation and independent of the data collection design. This definition covers input noise infusion (randomness applied to the input confidential data before the statistics are calculated; examples local differential privacy, multiply inputs by a random number, adding a random number to intermediate calculations before the output), output noise infusion (adding a random number to the statistic after it was calculated from the confidential data; examples central differential privacy), swapping based on random sampling from the set of candidate swap pairs, sub sampling for confidentiality, randomized rounding (rounding to a specified precision going up or down based on a random number, synthetic data (the published statistic is sampled randomly from a probability distribution estimated from the confidential data). What data products are affected by the recent Department of Commerce order banning noise infusion? 1. All tables and PUMS from the American Community Survey (noise infusion is used throughout: the swapping puts 100% of households at risk and the swap is chosen randomly from the key matches, DP is used in the unweighted tables, input noise infusion is used on the age data, the PUMS is sampled explicitly for confidentiality protection.) 2. Many tables from the Economic Censuses (suppression is the primary disclosure avoidance mechanism) 3. All tables from County Business Patterns and Zipcode Business Patterns (input noise infusion is the primary disclosure avoidance mechanism) 4. OnTheMap and OnTheMap for Emergency Management (input noise infusion on the employer side, differential privacy on the residential side) 5. Business Dynamics Statistics (input noise infusion) 6. Business Formation Statistics (differential privacy) 7. Post-secondary Educational Outcomes (differential privacy, same method as used by the IRS and Department of Education in the College Scorecard) 8. Veterans Employment Outcomes (differential privacy) 9. Geospatial Environmental Data (differential privacy) 10. Opportunity Atlas (output noise infusion) 11. Quarterly Workforce Indicators (input noise infusion) 12. Job-to-Job Flows (input noise infusion) 13. All tables and microdata from the 2020 Census of Population and Housing including redistricting data (differential privacy) 14. All experimental data products under current development (most use differential privacy, all use some form of noise infusion) 15. Many releases from the Federal Statistical Research Data Centers (differential privacy is required on many tabular summaries accompanying regression tables)

Again, all of this may seem a bit technical, but it has implications beyond the stats methods nerds. As DataIndex noted: 

While researchers are often primary data users, that loss would be felt far beyond academic research. Detailed public data help local governments plan services, businesses understand markets, workforce boards identify labor trends, advocates document inequities, journalists hold institutions accountable, and communities make the case for resources. 

Hansi Lo Wang identified reports that have already been removed, but luckily our volunteers, led by Lena Bohman, were proactive in collecting numerous Census Bureau working papers and other publications and uploading them to DataLumos. If any volunteer is interested, we would support a project to separate the papers on differential privacy and noise infusion and make them public again - just send us a note, and we’ll get you on the right track. In addition, the Wayback Machine has backups of census.gov websites if you need to track down a page by URL.

Regarding the datasets that used noise infusion, it is unclear how this policy will impact public access. The policy is intended to be retroactive, raising concerns that data might be removed, but how that will play out is uncertain. DataIndex is calling on Commerce to publish an implementation plan in the Federal Register so we can better understand the impact and, more importantly, comment on the proposed plan.

The key for us is to be aware and ready to speak up when needed! Even on behalf of math. 🛟 📊

Read more