The Foundations for Evidence-Based Policymaking Act has ordered agencies to share their datasets internally and with other government partners — unless, of course, doing so would break the law.
Nearly a year after President Donald Trump signed the bill into law, agencies still have only a murky idea of what data they can share, and with whom. But soon, they’ll have more nuanced options of ranking the sensitivity of their datasets before sharing them out to others.
Chief Statistician Nancy Potok said the Office of Management and Budget will soon release proposed guidelines for agencies to provide “tiered” access to their data, based on the sensitivity of that information.
The guidance was required under the Evidence Act, with the goal of encouraging agencies to share more of their data in situations where permission to do so has been unclear.
“In the past, it’s been kind of binary — it’s either open or it’s protected — and now we’re asking agencies to make a lot more of their data open. Well, this isn’t like an on/off switch,” Potok said Wednesday at the Data Coalition’s GovDatax conference.
OMB, as part of its Evidence Act rollout, will also rethink how agencies ensure protected access to data for research. Potok said agency officials expect to pilot a single application governmentwide for people seeking access to sensitive data not available to the public.
The pilot resembles plans for a National Secure Data Service envisioned by the Commission on Evidence-Based Policymaking, an advisory group whose recommendations laid the groundwork for the Evidence Act.
“As a state-of-the-art resource for improving government’s capacity to use the data it already collects, the National Secure Data Service will be able to temporarily link existing data and provide secure access to those data for exclusively statistical purposes in connection with approved projects,” the commission wrote in its 2017 final report.
In an effort to strike a balance between access and privacy, Potok said OMB has also asked agencies to provide a list of the statutes that prohibit them from sharing data amongst themselves.
“We’re looking at what does this mean in practice — this ‘Yes, you make it available unless there’s a statute prohibiting it,” she said.
The OMB guidance would, to some degree, spread to the rest of the government some of the best practices federal statistical agencies have already internalized about data use.
“The Evidence Act offers us a chance to change the conversation to say, ‘Why now?’” said Christine Yancey, the Labor Department’s chief evaluation officer. “This is really just about changing behaviors and attitudes towards the data privacy conversation, and I think the Evidence Act offers us a great entrée into the alteration of the conversation.”
Meanwhile, the National Science Foundation has looked at different ways to release data without disclosing the identities associated with that information.
Dorothy Aronson, NSF’S chief information officer and chief data officer, said scrubbing that sensitive data from datasets would make it easier for artificial intelligence algorithms and private-sector companies to use that data.
“When we collect data in the research agencies, we have a whole lot of rules around the data that we’ve collected, and so, even though it looks interesting and it seems like we could really match it up with a bunch of other stuff, we are not allowed to do that,” she said.
In order to make future data-sharing partnerships easier, Aronson said the agency should have more open conversations about data access.
“I’m a little skeptical about the release of the data. I like that we can, but I also feel, realistically, like there’s a lot of barriers,” she said. “We need to build our systems in such a way that we know from the beginning whether the data is going to be available for release or not,” she said.
Even in cases where there is open data available for use, Aronson said NSF employees have shown a reluctance to use the data over concerns about quality and reliability.
“Those things, I think, we have to more-or-less lower the bar for ourselves, and let us use data that might be of a B-quality as long as we explain that it’s of a B-quality,” she said. “We have to say this data may or may not have errors, so the information that comes from it is 80% accurate, or something like that.’