Bureau of the Fiscal Service explores ‘data lakehouse’ concept

The Bureau of the Fiscal Service is focused on simplifying its data footprint to make its data more secure while also applying more security to it.

Justin Marsico, the bureau’s chief data officer, said in a recent interview that the bureau is focused on four goals to modernize its data footprint.

The bureau, Marsico said, is focused on making sure its data remains transparent and accessible to the public. It’s also using data for analytics and to answer questions that will make the bureau more effective.

Finally, the bureau is looking to better understand the underlying infrastructure that stores its data, and how it can enable data-sharing across the organization.

To achieve all of these goals, the bureau stood up a Data Governance Council that brings executives across the bureau to work together on resolving data issues.

The council, however, isn’t starting from scratch. The bureau completed an inventory of its data a few months ago, as part of a Treasury Departmentwide exercise led by the chief data officer.

Now that the bureau has an inventory of its data, Marsico said the next step is to determine the overall maturity of the data.

“We can start asking questions about the data, like how complete is the metadata? Do we have an understanding of the data quality of each of the data assets that we’ve identified? That will give us a roadmap for what we should be addressing, standardizing metadata, coming up with a data quality improvement plan, looking for opportunities to find standards or areas where we should be standardizing data across the enterprise, and then beginning to implement those standards,” Marsico said.

The Foundations for Evidence-Based Policymaking Act Congress passed in 2019 requires agencies to consider its data open by default, while managing the security of sensitive information. But in order to put this law into practice, the Data Governance Council recently approved a policy outlining steps the bureau should take to meet the goals of the Evidence Act.

“It’s not enough for agencies to just put their data out there, if their data exists in a PDF or a Word document. That’s not the right way to go about doing things, so we’ve established some standards for ourselves that we have to follow. Our data has to be machine-readable, that means that it has to be structured in a way that is easy for analysts to actually use. Our data has to have metadata, it has to have information that says what the data is. And that really helps people to understand what they’re looking at and make sure they’re making the right decisions about it,” Marico said.

Marsico said breaking down IT siloes and simplifying the bureau’s data footprint will improve the security of that data, make artificial intelligence and machine learning algorithms more effective, and would save money by reducing the need to pay for multiple tools and licenses.

The bureau, for example, is looking at using AI to “pre-process” the text of the annual appropriations spending bills from Congress, in order to get the approved funds to agencies faster.

But for the bureau to reap the maximum benefit from automation, Marsico said it will need to bring all of its data in one place.

“When you bring all of your data together, you can think like an enterprise. So instead of just being able to analyze or try to build an AI or machine learning algorithm on the data inside of the one system that I might have access to, now you have access to the intelligence of the organization, and that will help us build better and smarter models,” Marsico said.

In terms of what a more modern data infrastructure should look like,

Marsico said the bureau is exploring the concept of a “data lakehouse” – a combination of a data lake and a data warehouse.

A “data lake” refers to a repository of unstructured data, such as images or a PDF of a check that’s potentially useful for AI or machine learning, whereas the “house” part of the analogy is focused on structured data that’s organized into row and columns, as well as semi-structured data.

“The data lake part of the lakehouse allows for all those different types of data to come into our environment. And then the house, or the warehouse part of it, allows us to be smart about how we are structuring the structured data, so that it’s easy for us to do queries, and to do reporting off of that. This is an area where we’re just getting started, but we’re really excited about the possibility of implementing this type of approach,” Marsico said.

The next thing the bureau is looking to tackle is having clear guidelines for sharing data inside of the Fiscal Service. The Data Governance Council has the ability to make a final determination on whether a data set a data set is shared from one part of the Fiscal Service to another, but Marsico said the bureau is just getting started in trying to execute on this work.

“There’s a lot of confusion about when it’s OK to share data, and when it’s not OK to share data … We want it to be really clear for an analyst to be able to know what the process is, for that person to have access to a certain data set and to be able to use it to help the business,” he said.

The push to transform data at the bureau isn’t just focused on infrastructure — it’s also focused on culture. Marsico said he’s wants to improve the quality and supply of agency data, but also increasing demand for this data from the bureau’s workforce.

“It’s important for our workforce to be asking questions that can be answered with data. So we want people around the bureau to be asking questions like, ‘What type of metrics should I be looking at to make my program run better, run faster, run more efficiently? What impact is my program having on increasing equity?’ As people start to ask those types of questions, that creates demand for data,” Marsico said.

Comments

Sign up for breaking news alerts