Valuable, messy and contentious: How big data became ‘new oil’

Best listening experience is on Chrome, Firefox or Safari. Subscribe to Federal Drive’s daily audio interviews on Apple Podcasts or PodcastOne.

Lawmakers have set an ambitious goal for agencies to meet in the coming years — releasing inventories of data to the public in a machine-readable format without jeopardizing privacy or compromising personally identifiable information.

While agency IT officials recognize the Foundations for Evidence-Based Policymaking (FEBP) Act and OPEN Government Data Act present opportunities to get more value out of their data, they also see challenges in preparing the workforce to manage all that data.

“There’s a reason why they call data the new oil. It’s not because it’s so valuable, it’s because it’s so contentious — he who has it has the most power,” Donna Roy, executive director of the Department of Homeland Security’s Information Sharing and Services Office, said Jan. 31 at Veritas’ Public Sector Vision Day in Washington.

DHS, which shares data with homeland security agencies in at least 100 countries, as well as domestic partners in the private sector and other agencies, often negotiates the line between data sharing and security.

“It is a cultural issue much more than it is anything else,” Roy said. “There’s a point-to-point transaction that starts with, ‘I need your data. Why do you need my data? Which pieces do you need?  What are you going to do with them? How are you going to protect them?’ And if there’s something bad that happens, how are we going to jointly remediate some issues of data in the wild?”

But beyond those obligatory questions, Roy said data sharing between agencies still raises some challenges — such as whether two or more agencies’ data are measuring the same thing.

DHS has launched an evidence-based policymaking program centered on immigration, but parsing the data and sending it up the chain of command — from agency component offices to Congress — raises some hurdles in communication.

“The components who provide the data into it have worries that when the data on immigration gets reported to Congress, it’ll be different than what their data shows,” Roy said, adding that it could also ultimately impact the agency budgets set up by lawmakers.

Building a ‘data entourage’

The FEBP Act also requires agencies to appoint chief data officers and chief evaluation officers. Both jobs will put agencies in a position where they’ll have the resources they need to compete with the private sector for in-demand tech talent.

Adding to that workforce challenge, Roy said agencies need to build a “data entourage” around chief data officers.

“I think for every data scientist, there’s probably four or five people that you need there to support them,” she said. “I’m worried about data science, but I’m much more worried about data engineering, data artists or illustrators, data analysts and just the core data janitors,” or people who curate the data, worry about the quality and standardization of the data and making it available to others.

Dorothy Aronson, chief information officer at the National Science Foundation, said she has a “huge amount of confidence” that the agency’s current workforce can re-skill to fill some of these IT-centric positions.

“All of us are using IT technology. It’s not a central IT community anymore, and you don’t have to be born a data expert or data scientist in order to have a problem and need to do data analytics on it,” Aronson said.

Last November, the  Office of Management and Budget launched the Federal Cyber Reskilling Academy, aimed at identifying the next generation of IT talent from those already working in the ranks of the federal government.

Advertisement
Aronson pointed to the academy’s launch as a turning point on how agencies approach the tech talent gap.

“Within the community, we’re trying a lot of little experiments to see how we can identify people with innate skills and various capabilities and balance their skills,” she said. “In doing data analytics, it’s much more important to know the business problem than it is to be a statistician.”

With an increase in the availability of user-friendly data tools, Tony Peralta, data architect at the Bureau of the Fiscal Service, said gaining insights into data has never been easier.

“The responsibility now shifts from actually making that information trustworthy — making it clean [and] cleansing it, making sure that there’s a high degree of confidence that when an analyst is getting into the realm of self-service analytics and producing outputs, that they can trust the information they’re engaging with,” Peralta said.

As agencies inventory their data sets to understand what data is vital, Aronson said NSF looks to offers its data “without strings” — navigating data users to the information they need within the organization, and in turn “convincing them that sharing their data will get them some more information” about their own agencies.

AI strategy arms race

The new open-data law also places a premium on data security and privacy, especially as artificial intelligence becomes both cheaper and more sophisticated.

What used to require super-computer processing a few decades ago, Roy said, now costs “pennies” on the dollar.

“It is now affordable to do the kinds of machine learning [and] AI capability I started working with over 30 years ago,” she said.

The White House’s task force on artificial intelligence expects to release this spring an updated version of the AI research and development plan the Obama administration launched two years ago. However, Roy said the U.S. may have some catching up to do in this space.

“Our adversaries have a machine learning-AI strategy that’s much more aggressive than maybe even ours,” she said. “I don’t know that we have the luxury of arguing this for too long, because our data is going to be used against us or used with us as we move forward.”

Copyright © 2019 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.