Best listening experience is on Chrome, Firefox or Safari. Subscribe to Federal Drive’s daily audio interviews on Apple Podcasts or PodcastOne.
Some of the leading thinkers in the field of data and how to use it have proffered the idea of a national data service. The Data Foundation says such a service would support the goal of evidence-based decision making. Joining Federal Drive with Tom Teminwith how a data service would be organized and what it would do, the former chief statistician of the United States and Data Foundation member, Dr. Nancy Potok.
Dr. Nancy Potok: Thanks so much. I love being on your show.
Tom Temin: And tell us about this idea. You and Nick Hart of the Data Foundation authored this idea of a national secure data service — tell us what that would be in your mind.
Dr. Nancy Potok: We’re building on the recommendations of the US Commission on Evidence Based Policymaking. I was one of the commissioners, Nick was the policy director for the commission. It was bipartisan, and in 2017 we submitted 22 recommendations to Congress, and prominent among them was to set up a national secure data service which would serve as a place for linking the most sensitive data that the government has, and using it for statistical purposes. So this is not open data, it’s not operational data, it’s not something that would be used, let’s say, for law enforcement. It’s to create data that would provide insights into some of our most pressing policy issues, but the de-identified so that no individual person’s data could be accessed or used for any purposes. So it’s to provide these broad insights.
Tom Temin: And this data is being collected already by federal agencies?
Dr. Nancy Potok: Yeah. So one of the drivers of trying to be more efficient in improving statistical data is that the traditional method is to go out and do a survey, and they range from the most expensive billions of dollars to do the 2020 Census, to smaller surveys, but they’re more and more expensive and it’s a big bother for people to have to answer surveys, they don’t like to do it. So reusing information that people have already provided to the government is a much more efficient way to put these statistical databases together.
Tom Temin: So an example might be, and I’m just making this one up, but all of the small businesses that got loans under the Payroll Protection Plan in the pandemic, there’s tons of data about small businesses — where they are, what they do, how many employees they have, etc — that could be de-identified in terms of the individuals and somehow used for understanding what policy should be toward small business better.
Dr. Nancy Potok: Absolutely. And you want to look at sometimes what happens over time with businesses. So you might be able to look at, for example, new businesses that were created right before the pandemic shut them down. And then you can sort of see, well what happened to them? Did they reopen? Did they survive? You can see maybe characteristics of the businesses that did better than others. But there’s actually a lot of other applications of this. For example, before the pandemic hit, but I think it’s going to be a big issue with the high unemployment rate that we have right now, is education and workforce. So you want to know if you can look at people’s lifetime earnings, for example, and tie that to the education that they had. Do you need a two year education, a four year education? Does your geographic location matter? What about these professional certifications? What are the job training programs that are most effective in raising people’s lifetime earnings. Those data are scattered all over between different agencies, some of them the states have. And so this would be a way of bringing it together in a very secure controlled environment, and treating it like statistical data is, like census data is, with legal protections that protect the confidentiality so you can start to get answers to those questions. And then have effective public policies. If you’re spending tax dollars, you want it to be effective to get the best bang for the buck.
Tom Temin: From an organizational standpoint, how do you envision this data service being set up and organized? Where would it live somewhere in the bureaucracy?
Dr. Nancy Potok: Yeah, we looked at several options. We came up with what we thought were the right characteristics in terms of transparency, privacy, protection, oversight, independence from political influence on the data. And we have four options, and the option that we settled on that we thought was actually going to hit all of those attributes was to set up a federally funded research and development center, otherwise known as an FFRDC, which many people in the federal government would be very familiar with — and place it with NSF, the National Science Foundation so we would have something that was a public private partnership. But the National Science Foundation has really good relationships with the research community, they’re science oriented, there would be a measure of independence for integrity of the data, and they’re used to working with sensitive data. And there would be the flexibility with an FFRDC being contracted for outside of government to be able to have the scalability that you’d want to grow fast enough and to attract and pay the skilled employees that you need to do this kind of work. It’s highly technical, and quite honestly, the federal government has trouble recruiting and retaining people at the salary levels that you need and it takes forever to hire people in the federal government. So having an FFRDC would give that kind of flexibility and the National Science Foundation is where we’d like to see that happen.
Tom Temin: I imagine there would be a pretty heavy operational aspect to this because the data has to be maintained.
Dr. Nancy Potok: A really important point is this is not a data warehouse — this is a linking service. So agencies will keep their own data, that’s much more secure. And what would happen is if there are particular parts of the data that you want to see, that would go to this service, they would link the records together, de-identify them and make them available in a very controlled environment to approved researchers for specific projects. When that’s done, the data is not stored in a data warehouse. It’s de-linked and it stays with the original agency, and that was what the commission envisioned. A data warehouse is sort of yesterday’s technology and not secure enough to do this kind of thing.
Tom Temin: Well, yeah, and it gets expensive and there’s a lot of garbage that kind of collects over time that you really don’t want with it. But that becomes incumbent on the agencies to keep their data to standards that would be useful to the data service and to the integrity of what it is they’re doing.
Dr. Nancy Potok: That’s absolutely right. And in fact, before I left government, I was cochair of the Federal Data Strategy with the federal CIO. And we felt very strongly that agencies needed to really step up their game in terms of their inventories, in terms of assessing data quality, in terms of standardizing certain things, the metadata they were keeping. And so these are all complimentary. The types of data that would go to a national secure data service, you would want to really be high priority data that the federal agencies are focusing on as part of the Federal Data Strategy. These are connected.
Tom Temin: So many of these sources exist already because I’m thinking of not just the PPP program and all the Small Business Administration data, but Agriculture has the SNAP and TANF programs. When you think about it, there’s limitless links that the service could really draw on.
Dr. Nancy Potok: Right. Another example was HUD when they were looking at homelessness and public housing, they did a study where they linked together the public housing data. And they looked at earnings data at income, at education, to see did it make a difference what type of public housing people were in. And they found, for example, when they brought all of those together, and they looked at child welfare programs, that housing vouchers were actually very cost effective, and worked well to change the lifetime trajectories of people. And so that was a important program that they went on with. So it’s all kinds of data really, that we’re talking about.
Tom Temin: So the proposal is out now from the Data Foundation, what happens with it?
Dr. Nancy Potok: Well, so one of the things that the Foundations for Evidence Based Policymaking Act of 2018 did was it established an advisory committee to look at this issue. That was the legislation that enacted 11 of the commission’s recommendations. So there’s a Federal Advisory Committee, the members have all been appointed. Had I stayed with government as the chief statistician, I would have been chairing that committee, but the next chief statistician will do that. And what we did was we made recommendations for what that advisory committee should come up with to look at some governance and oversight issues and things like that. In the meantime, what we’d like is for the National Science Foundation and for Congress to look at this seriously, and give the money to the National Science Foundation to do a contract, go out and compete for an FFEDC to start providing the service.
Tom Temin: Alright, let’s hope they’re listening. Dr. Nancy Potok is co author of the Data Foundation’s proposal for that national data service and she’s former chief statistician of the United States. As always, thanks so much for joining me.
Dr. Nancy Potok: Well, thank you very much. I love talking about this.