Guest Post: Two Ideas to Support and Reward Data Collectors
Today, we have the fifth post in the series from Claire McKay Bowen and Aaron R. Williams to help diverse audiences understand and support the federal statistical system. Everyone living in the United States is part of this vast statistical ecosystem and benefits from it—both directly and indirectly.
Check out their first post in the series on the uses of public data from reducing lead exposure in consumer products to improving agricultural productivity. Their fourth post presents a "Day in the Life with Government Data," which details the myriad way federally funded data suffuse our daily lives.
The collection of data that fuels everything from advanced research to everyday decision making is, by many measures, underappreciated and insufficiently rewarded.
In research, this is because current incentives reward researchers for publications rather than for high-quality data creation. In other words, those who collect and curate data often remain invisible, while professional recognition and career success tend to favor activities such as writing papers and securing grants—again, activities that rely heavily on the very data being overlooked.
As we’ve noted in earlier posts in our federal statistical system series, federal data are largely invisible to everyone—even the researchers who rely on the data and statistics for their work. This piece proposes two active steps the research community, from mathematicians to literary scholars, can take to champion and celebrate data creation and availability as essential for research and evidence-based policymaking.
1. Data Creation Should Be Systematically Tracked
In a perfect world, valuable datasets would be discoverable, reusable, and their impact would be automatically measured. Such automation would ensure credit that data stewards receive credit for their hard work in curating and maintaining data. Achieving this would require proper citations of datasets and tracking them with unique and persistent identifiers. Journal editors should require the citation of datasets to help systematically track the impact of datasets, much as we track peer-reviewed publications.
To support better citation tracking, data curators should universally adopt Digital Object Identifiers (DOIs). ICPSR, the world’s largest social science archive, recommends using DOIs to cite data and has long used them to track data they own. DOIs carry a few benefits:
- Visibility and Impact: Researchers often don’t know that key datasets—like the CDC’s Youth Risk Behavior Surveillance Survey that tracks diet and sleep trends among youth—even exist. Tracking citations for all data (including public federal data) highlights the importance of these datasets and where they are being used.
- Version Control: Updating DOIs when data change (e.g., new collection years or error corrections) builds trust and clarity in how data informs policy, even when these data are discontinued or terminated.
- Tracking: DOIs would make it easier to track the full universe of data released. This would make it more difficult for federal agencies to remove previously published datasets and would make it clearer to users which datasets have been removed.
2. Data Creation Should Be Celebrated and Rewarded
Professional societies could create awards celebrating and recognizing achievements in data creation, just as they do for outstanding research projects and publications. Such recognition would be well received in the government sector, where these distinctions influence promotions and evaluations more than producing peer-reviewed publications.
Everyone in the U.S. benefits—directly or indirectly—from the federal statistical system. In our first blog post, we showed how federal data have supported meaningful progress, from reducing lead exposure to improving agricultural productivity.
To help elevate the value of data, the research community could recognize data and statistics as foundational, not incidental. The research community could acknowledge and elevate the work of those who collect, manage, and share the data that make evidence-based research and policymaking possible.
Let’s give data creators the credit they deserve!