Guest Post: Where do Official Statistics Come From?

Guest Post: Where do Official Statistics Come From?
Photo Credit: Census Bureau

Today, we have the sixth post in the series to help diverse audiences understand and support the federal statistical system. This post is from Barry Johnson, and provides an overview of different sources of federal data.

Check out their first post in the series on the uses of public data from reducing lead exposure in consumer products to improving agricultural productivity. Their fifth post presents two ideas for supporting, and rewarding, data collectors.


In previous posts, we’ve raised the point that federal statistics are the essential—though often unrecognized—ingredients that impact all aspects of our lives: from determining whether we need a coat or umbrella in the morning, to selecting a college or major, or even deciding where to start a new business.  In this post, we explore the sources of these statistics.^

In the Beginning… Censuses

The 1790 population census, required under the newly adopted Constitution, was the source of the earliest U.S. statistics.  Based on similar censuses conducted by the British government throughout the colonial period, it was a simple headcount of every American, classifying them by age, sex, and race.  Twenty years later, data collection had been expanded to include information on manufacturing; by 1850, questions on other industries and social data, such as information on churches, poverty, and crime, were added.  

The Rise of Surveys

As the population grew, so did the need for timelier information. Once-a-decade censuses were no longer sufficient. Fortunately, the development of large-scale probability surveys in the 1930s provided the perfect foundation for expanding federal statistics. Modern sampling theory demonstrated that information collected from a relatively small sample of respondents could be used to estimate characteristics of entire populations. Compared to censuses, surveys provided detailed information at greatly reduced cost and public burden, and improved timeliness.

Until recently, estimates based on survey data dominated federal statistics. In recent years, rising data collection costs and declining survey response rates have raised concerns.  A recent report noted that, since 1990, response rates on major federal surveys have fallen from about 90% to 70% for the Current Population Survey and to about 40% for the Consumer Expenditure Survey today.  Difficulty in contacting households, privacy concerns, and survey overload due to the proliferation of satisfaction and opinion surveys have all been cited as reasons for unwillingness to participate.  While less well studied, item non-response, or questions respondents do not answer when taking a survey, has also risen.  The Committee on National Statistics noted that both declining response rates and increased item non-response can lead to incomplete measurement and increase the risk of inaccuracies in statistical information.

Other survey limitations, including the long production cycles that reduce timeliness of survey-based estimates and small samples that make it difficult to accurately measure characteristics of small groups, have increasingly prompted statistical agencies to explore new data sources.  We will take a look at some of these new data sources in our next post. 

^See Principles and Practices History for further information: https://nap.nationalacademies.org/read/24810/chapter/7