Open data in federal agencies has been a priority of the government since President Barack Obama issued an executive order in 2013 to promote transparency. But the risks and fears that arose in the past may now be surfacing as the U.S. and other countries make the transition from pen-to-paper to online resources.
Now in 2018, industry stakeholders are worried that some important data may be getting lost in translation.
“There’s intrinsic value to federal agencies for releasing this data,” said Denice Ross, fellow at New America, at a Feb. 27 Center for Data Innovation event. “It reduces their public records request burden, it helps them get ahead of news requests, gives everyone access to the same information so that news agencies can report based on better information … so there’s a lot of value.”
Citizens and federal workers are now tuned in 100 percent of the time as more agencies move their information to websites. If this is correct, it may also be true to say more eyes are on the information the government puts out, including data sets and the results of studies.
With that kind of pressure and new data coming in all the time, protection of both the information and those involved becomes even harder. Ross suggests bringing in more local talent and contractors to find a happy medium when it comes to standardizing how data is kept and disseminated.
“I think the newer a data set is, the more you want to foster that local innovation. And then as the field starts to mature, you want to see standardization start to emerge,” Ross said. “And that can be bottom-up and it can also be top-down. And ideally, they meet in the middle.”
But risks of relying on data — including some that may not be accurate or released in a timely manner — makes stakeholders uneasy. Ross said agencies and other companies that release data to the public have been known in the past to undermine it in a variety of ways:
Shaving off attributes
Releasing fewer data tables
Increasing barriers to access
Releasing data slowly
Changing data schema so its not comparable accross years
Aggregating data so it’s not useful
She said better attribution of from where information is coming, and what companies, agencies and individuals plan to do with the data, is the solution.
“If the will is not there, the data’s not going to flow as freely,” she said. “So I think that’s why it’s so important for us as citizens not to take data for granted. It doesn’t sprout from Zeus’ head or from Google. It comes from real agencies that are doing real work and better attribution of when people use data, I think will help with that.”
Presentation, access problems
The presentation and purpose of why certain data is collected has also raised eyebrows. Paul Farber and Patricia Kim both advocate for agencies and individual groups finding a way to define their relationship with data. This is why they’ve created what they call the Stories Project
“What is more scientific research going to do if our agencies and you know, different kinds of publics, don’t accept it?” Kim, co-founder of Data Refuge, asked. “So again, bringing it back to this question of open data presentation and access. [They] are important, but also communication and translation. So storytelling I think is really critical.”
The stories project wants to create that story bank to tell how federal and state data are important to communities and citizens. Focusing on specific states, and places where certain information matters. Farber, managing director of the Penn Program in the Environmental Humanities, said building bridges between the data collector and the user is critical to ensure data is public, accurate and still protected.
He said this will also help agencies see what data they should actually be collecting and presenting to the public.
“Part of where Data Refuge has gone and a number of our conversations have been, a kind of mixed bag when it comes to our kind of federal stances on data, that in one sense open data initiatives still remain,” Farber said. “But a number of the administration’s kind of tactics around data writ large have to do with data that’s proprietary, has to do with data that’s selective; that’s tied to directives from the administration.”
Problems exist on the backend as well, including agencies having difficulty providing enough access. But two of the main issues that have surfaced: Budget cuts and lack of staff.
The Census Bureau provides a perfect example. It has already been determined that the fiscal 2019 budget request will most likely not provide enough resources for the 2020 census. The Bureau is making a conscious effort to create an online platform, as opposed to sending collectors door-to-door. But even that won’t be simple, the panelists agreed.
“As somebody who went door to door collecting this information, that sounds like great news to me. … Anything we can do to automate that, reduce the cost and make it easier to collect that data is good,” Gavin Baker, assistant director of the American Library Association, said. “But, it’s a big transition and the Bureau has not had in this 10 year period, the funding, the staffing [or] the leadership that it has asked for to be able to make sure that the transition is managed well.”
Baker said public places such as libraries and universities need to have the proper staff to handle an influx of people who will come in the future to fill out their Census surveys.
John Thompson, executive director of the Council of Professional Associations on Federal Statistics, said it’s also getting more difficult to even collect data using surveys, because costs are rising. Agencies need to find alternative ways to produce data.
“The problem is, they’re having trouble right now because of funding [with] just putting out the products that they have to put out,” Thompson said. “And some products are not being put out as frequently as they should, again because of staffing and funding.”
Some progress has been made, but the agencies are all getting squeezed tremendously in terms of their ability to bring on the kind of staff that they need to do the kind of innovation they need to do to keep producing high-quality, accurate and open data.
Both the House and Senate have issued versions of a bill that would protect and promote this strengthening of open government and data transparency. However, as Baker noted, they have yet to reconcile their differences.
“I think that shows that there is a high level of understanding on both sides of the aisle that this is a really important issue,” Baker said. “We’re hoping soon that they will merge their differences and send a strong bill to the president’s desk.”