The White House is ushering a new normal when it comes the federal government’s data.
The administration’s new policy and executive order released Thursday are the forcing functions to make data accessible and open that has been missing over the last decade.
“I think it is a sea shift to some degree,” said Steve VanRoekel, the federal chief information officer in an interview with Federal News Radio. “I very much embraced the power of open data in the work I’ve done inside government and even in the private sector before coming to government, and it’s not the norm inside to take these approaches. We are still very heavily paper based and valuable data that should be made public are still locked up in these very proprietary systems. And in order to really create a sea change, we need a policy like this. We need the voice of the president issuing this Executive Order telling agencies this sets a new default and this is a change in the way we behave and the way we do things.”
He said it also opens the door a bit wider for the Office of Management and Budget to ensure agencies are addressing open and accessible data as they create new or modernize existing systems.
Running hot and cold
The Obama administration over the last four years has been hot and cold when it comes to openness and transparency. President Barack Obama signed the open government directive during his first day in office in 2009. Over the next year, the administration launchedData.gov and the Open Government Partnership internationally. But progress stalled out during the middle of the administration as open government dashboards, blogs and agency plans stagnated for a time.
Even with this open data policy, OMB initially planned to release it in November, according to the Digital Government Strategy, but took six months longer than expected.
VanRoekel said agencies have been moving in the open data direction over the last decade, but not in a way that was as coordinated as it could be.
Mirroring E-Gov act requirements
In fact, many of the memo’s requirements follow closely to what section 207 of the E-Government Act of 2002 first called for more than a decade ago.
The law required agencies to make data more accessible. It called on an interagency committee to talk to the public about, conduct studies and share best practices for the most effective way to access, disseminate and retain government information. It also called for the adoption of standards to make data searchable and interoperable.
VanRoekel said OMB worked closely with agencies on creating the policy as part of the broader Digital Government Strategy effort. He said the strategy as well as the Open Government Directive, the E-Government Act and other policies helped lay the foundation for where the government is heading today.
The biggest change and challenge called for in the memo by far is the requirement for agencies to do an inventory of their public data sets and maintain a public listing of those datasets, said Patrice McDermott, director of OpenTheGovernment.org, a good government group.
“From the public’s perspective, if they could get the agencies to actually do these enterprise inventories and the public listings of what data they think should be publicly available and is, that would be really important,” she said. “One of the things when they did their open government directive was they told agencies to release three high-value datasets and then they asked the public to help them decide what they were. We kept saying, ‘We don’t know what we don’t know. There are no lists of what your datasets are. There’s no data dictionaries attached to the ones up there. We don’t know what they are. We don’t know what they do.'”
VanRoekel said the inventory will be important to understanding where the gaps are to comply with the other parts of the policy.
“In the Executive Order, the president lays out the groundwork for us to a cross- agency priority goal,” he said. “So we are going to get a working group together and then produce a set of deliverables and an agency goal so we will put milestones based on the inventories we get in to say here is where we need to go and take this forward. There has indeed been good progress out there. I think modern systems are largely starting to adopt this, but it has permeated into the far corners.”
VanRoekel added the focus will be on new or modernized systems because of the cost to move to legacy data into machine readable formats.
Wide praise for memo
Another big change for agencies is the public engagement. OMB wants them to create a page to solicit advice for prioritizing the release of data and what are the most usable formats.
The memo called on departments to make data available in multiple formats including as APIs.
McDermott and other good government groups and industry associations praised the memo and executive order.
John Wonderlich, a policy director at the Sunlight Foundation, wrote in a blog post that the new policy moves “beyond vague aspirations” of previous open data efforts.
Wonderlich wrote, “By requiring agencies to publicly list all their data that could be made public, the President is not just reaffirming that decisions about disclosure should be based on the public interest, he’s also giving the public (and Congress) tools to enforce them. When open data procedures are incorporated into agency processes from the start, we’ll start to see more systems designed for bulk access from the start, and we’ll be better able to recoup all the missed opportunities in legacy datasets that are still closed. We’ll be able to evaluate agencies’ transparency against what they’ve defined as their candidates for release, and clearly identify areas where agencies avoid disclosure altogether.”
TechAmerica’s senior vice president of federal government affairs Kevin Richards said in a release that the policy follows closely the recommendations from the organization’s Big Data Commission.
“Access to the monumental amount of government data will fuel untold numbers of new innovative ideas in this country,” Richards said. “By making open data the default policy of the entire federal government instead of discretionary, President Obama has handed the U.S. technology industry a key to expand our global leadership in this era of big data.”
Just structured data?
McDermott said there are some concerns, especially around OMB’s decision to define data as only structured data. She said some estimate that unstructured data makes up 70-to-90 percent of all data.
“That leaves a lot of stuff out of the purview of this policy so we are a little concerned about that,” she said. “They say you can take an unstructured thing and break it into component parts and basically do meta-tagging of it. But it’s troubling that is on a site they created and a definition on that site and truncated it for the purpose of this policy.”
McDermott said the policy also doesn’t define “information systems.” She said there is a definition of information system in Circular A-130, but it’s not incorporated into the policy document.
She also said OMB’s reference to the mosaic effect in the memo causes some concern. The mosaic effect is the idea that when multiple datasets that by themselves have no sensitive information are brought together they show a picture or identify personal information.
McDermott said there is no statutory or regulatory definition prior to this policy.
Mosaic effect not a big deal
VanRoekel said agencies will determine what datasets are structured or unstructured.
“If you look at the way this lays the foundation for the way we treat just structured data, there’s an easy conversation around how do you now build solutions on top of data or how do you treat unstructured data,” he said. “We want to have those conversations happen, and this is the beginning point of them.”
As for the mosaic effect, VanRoekel said they referenced it as more of a term of art that is emerging from the big data discussions.
He said they’ve talked about the mosaic effect a lot as part of the work in the health field to make data more accessible.
An agency may have to justify holding datasets back because of the potential or real security or privacy issues that arise from releasing them.
“We’re going to have those discussions. I think the data that we have, that could be made public, that is very easily made public that there wouldn’t be a lot of deep conversations about why not to make them public except they aren’t today,” he said. “This policy will allow us the leverage to go do that, and to drive publication of it. I don’t think the mosaic effect will come into play that much just because the wealth of data we have that lives outside the realm of personally identifiable information.”