Bureau of the Fiscal Service moving toward open ‘data lake’ powered by cloud

Best listening experience is on Chrome, Firefox or Safari. Subscribe to Federal Drive’s daily audio interviews on Apple Podcasts or PodcastOne.

Amid a push to make government data more accessible, the Treasury Department’s Bureau of the Fiscal Service is moving away from structured data silos in favor of an open “data lake” approach.

“In order for us to get full utility of our data, we really needed to be able to create a platform where we can actually use all of it,” Tony Peralta, the agency’s data architect, said in a Jan. 31 interview following Veritas’ Public Sector Vision Day in Washington, D.C.

The data lake, an inventory of raw information, has played a crucial role in helping the bureau better assist people contacting their call centers, looking to settle delinquent debts.

Tony Peralta, data architect for the Treasury Department’s Bureau of the Fiscal Service

“One of the things that we’re looking to do with building a big data lake is basically the ability to take voice files, transcribe them, and be able to get sentiment analysis from that information,” Peralta said, both to help the agency meet its mission objectives to help meet the goal outlined in the President’s Management Agenda to use agency data as a strategic asset.

The bureau relies on the Workforce Community Cloud (WC2), a shared-service cloud architecture, provided by the Treasury Department’s Office of the Chief Information Officer, to ingest the call audio, transcribe the text and run the sentiment analysis.

Before moving to the cloud, the bureau faced constraints in conducting this level of analysis.

“The traditional enterprise data warehouse, more often than not, had analysts engaging data and pulling it down, doing their own scrubbing, then trying to run the model on a laptop or a workstation that is usually bound by its configuration. So you only had so much memory that you can execute. You only have so much hard-drive space that you can actually perform the analysis,” Peralta said. “The inefficiencies were very clear.”

The bureau’s cloud environment, which powers its data lake model, allows bureau employees to gather more data-driven insights.

“Now we have elastic compute. We can actually spin up clusters of our enterprise data warehouse footprint,” Peralta said. “We can build those things up to our needs and we can obviously tear them down. And being good stewards of taxpayer dollars, really maximizing the flexibility of cloud infrastructure and services.”

Through this sentiment analysis, the bureau looks to ensure a high level of customer service through its call center support.

“While we want to use that information to actually help us achieve our mission, we also, of course, want to use it to better serve the citizen,” Peralta said. “We want to know when we’re not hitting key pieces of information, where sentiment analysis may indicate we need to train some staff up to really provide a better service to the citizen.”

The bureau also looks to use the analysis to correlate whether certain actions from call center staff actually leads to callers resolving their delinquent debt.

“It’s just really leveraging our information strategically to basically say, how can we improve overall? So we’re guaranteeing to a certain extent that the citizen engaging with us are really getting a positive experience throughout, throughout their debt resolution conversations with our agents,” Peralta said.

However, he cautioned that moving to a cloud-powered data lake won’t serve as a “silver bullet” to solving an agency’s data management problems.

“Unless you have the right governance — the right understanding of what’s going in, what’s going out, how your data is defined, how it can be used, how it can’t be used — that data lake quickly becomes a data swamp,” Peralta said. “Governance is key — cataloging information so that the folks that are engaging with this diverse pool of information really understand the value that they can obtain from it.”

The need to build strong foundations in data management becomes more critical as more agencies pursue pilot programs in artificial intelligence and machine learning. 

At the same time, landmark legislation, like the recently passed Foundations for Evidence-Based Policymaking Act also emphasizes the need for open data to remain secure and untraceable to personally identifiable information.

“Our bottom line is public trust,” Peralta said. “It’s important for us to ensure that the data that we collect is basically used for the purposes for which it was collected.”

Copyright © 2019 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.