Agencies are laying the foundation for artificial intelligence and automation tools by first getting a handle on their vast stores of data. The major task of in...
Agencies are laying the foundation for artificial intelligence and automation tools by first getting a handle on their vast stores of data.
The major task of inventorying data and improving the data maturity at agencies is a fundamental step to fielding AI and automation tools across the federal government.
But in the process of completing this work, agency data leaders say they’re having an easier time getting the right data to the right people at the right time.
Damian Kostiuk, the deputy chief data officer at U.S. Citizenship and Immigration Services, told Federal News Network during a recent panel discussion that his office is looking at ways to use data to improve the agency’s customer experience.
“If we’re going to be able to do any of the AI/ML projects, and actually just solve a lot of basic problems around the agency, even before you get to ML, and AI, you’ve got to have the data to solve the problem,” Kostiuk said on Nov. 17 at ATARC’s AI and Data Summit.
USCIS is specifically looking at how data can improve the digital experience on its website, and how to facilitate and expedite the processing of its caseload.
But to help get this work in motion, Kostiuk said his office said has been cataloging agency memorandums of understanding that set data sharing and data management policy.
“Sometimes it’s actually kind of disorganized in the past, and it’s been all over the place. But we’ve done a lot of due diligence to try and consolidate that, make sure there’s good controls over it,” he said. “When MOUs have been signed, and we would negotiate them, they’ve always included huge elements on data standards, and data quality requirements … In particular, we’re really emphasizing trying to get data standards across all DHS and the federal space.”
Data management at USCIS also requires setting up clear guardrails on how certain data sets may be used.
“We have plenty of data that can only be used for non-law enforcement purposes, and we have data that can only be used for law enforcement purposes. They have to be very much bifurcated, they cannot see each other. So there are natural limits, per se, to what you could do, but we adhere adamantly to that. And as a consequence, you have to have very good data management in order to make sure that you don’t spill and have those two bits mingle,” Kostiuk said.
While few agencies have rolled out AI tools beyond limited pilots, Kostiuk said USCIS, by improving data maturity, has improved the “time to market” of getting the right data to the right people at the right time.
“In the past, if you had to spend nine months just trying to work on cleaning up the data before you could even get your project, you’re not exactly going to be helping the American people very quickly,” he said.
Kostiuk said the current state of the agency’s data makes it possible to complete tasks in weeks that would have previously taken months or years.
“Because the data quality was so much better, because the data-sharing agreements were in place, and we had connections across all the other kinds of immigration space, we were able to get projects done in two weeks, which I swear would have been at least nine months in the previous era … Critically, the data were good coming in, and so we can make good decisions quickly and get the information we needed, whether it was statistical to leadership for making policy decisions, or to operators to actually get the job done,” he said.
Suman Shukla, the data management section head of the U.S. Copyright Office’s product management division, said the agency has used optical character recognition to digitize some of its records, but is looking at AI tools to fast-track this workload.
The U.S. Copyright Office has 41 million records in its card catalog. Those records are a mix of typed and handwritten records.
“The biggest challenge for us is to capture those images, extract the metadata and do a real-time keyword search to figure out which information we are looking for,” Shukla said. “People who have done the copyright work, they do not have to come physically in the building to pull the drawers and look at the card to find out what work has been done.”
Shukla said the U.S. Copyright Office needs access to decades’ worth of data whenever contests a copyright claim in court. Copyright protections in the U.S. cover the lifespan of the author, plus an additional 70 years.
“We have an immediate emergency to provide all kinds of information related to that work in court,” Shukla said. “Our data cannot just be archived and left behind, it has to be archived in a way that we can retrieve at any point in time when needed for such situations.”
Shukla said the U.S. Copyright Office conducted a data management initiative to understand the current state of its data. The initial analysis has shed light on what data the agency has and where it’s stored.
Shukla said the analysis has also helped the agency sunset some of its legacy systems and move the data to newer systems.
“There’s data you can just publish online [under] free open data policy. There could be some information that has FOIA-able information, but is not readily available to the public. There’s information that’s for agency-only sharing or team sharing. Or there’s classified information, so you cannot share it all,” Shukla said.
Alexis Banks, an IT specialist with the EPA’s Office of Chemical Safety and Pollution Prevention, said her agency is looking at AI to flag names and signatures on archived documents.
“It’s very important for folks to get that data on time, so they can make impactful decisions. Now there’s this whole cycle of how we do that. We collect the data, we have to clean the data, we have to organize the data. But for certain agencies, we have that data placed in all these different places that we need to be able to compile them all in one place, so that we do have a quicker system. It’s just how do you do that. And so moving forward, we have minimized the time, and that’s the whole point, trying to get things to work faster,” Banks said.
Xuan Pham, a senior actuary at the Agriculture Department’s Risk Management Agency, said the agency invested early in data management, and is using that data to make sure crop insurance payments get to farmers more quickly.
“If a farmer was going to come into a county office, and spend an entire day filling out an application by paper, that could take hours and hours of work. What we’ve done is that, because we have that data, we pre-fill the application for them. All they have to do is look at it, check it, make sure it is correct, and that’s it. So instead of taking months and months, we were able to reduce that time down to weeks, and that is huge,” Pham said.
The agency also provides machine-readable, county-level data every week on the causes of crop loss.
“We as an agency have built really a structure to collect that data at the field level, and to be able to know exactly what’s going on. And so, because of that, we benefit from it a lot,” Pham said. “We benefit a lot from having that foresight, that leadership that has happened over multiple decades.
Copyright © 2024 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.
Jory Heckman is a reporter at Federal News Network covering U.S. Postal Service, IRS, big data and technology issues.
Follow @jheckmanWFED