On May 14, the Office of Management and Budget announced the graduation of the first cohort of federal employees enrolled in the Federal Data Science Training Program. The program’s role molding talent and improving data science skills will help meet the goals of the Federal Data Strategy and deploy a continuous stream of data scientists, which sets the groundwork for agency leaders using data as a strategic asset.
For most agencies, using data effectively starts with data readiness and governance—making data easier for the federal workforce to search and use. As Forrester estimates, between 60% and 73% of all data within an enterprise goes unused for analytics.
As data proliferates in the government, many organizations are starting to focus on how data readiness can ensure better mission outcomes. For example, the Department of Defense is looking for ways to help its personnel and government users prepare data for use in AI applications. The agency’s most recent Data Readiness for AI Data (DRAID) RFP aims to “create the troves of AI ready data that will power the transformation of the DoD through AI.”
Taking the DoD’s lead, civilian agencies will soon follow. IT leaders at agencies across the government should focus on a strategy balancing data integration, governance and workflow automation to accelerate data readiness and meet the goals of the Federal Data Strategy.
Improve readiness through integrated data preparation
Data readiness is a critical step toward building advanced analytics and AI/ML capabilities. But it is a complicated process for any organization, large or small. To accelerate data readiness for more user types and use cases, agencies must first address data preparation challenges.
Data preparation includes the sequence of processes that take data from its original source through profiling, transformation, integration, cleansing and enrichment until it is in a ready state. Although these data preparation processes are dependent on each other, in practice they are frequently disconnected, creating bottlenecks causing inefficiency and delay.
To address problems, organizations need technologies and practices that knit together the whole sequence while not restricting user flexibility. Agencies can leverage data catalogs to improve cohesion between different stages of data preparation. This resource can locate all data appropriate for a given project and resolve inconsistencies driven by siloed data preparation practices.
Address weaknesses in data governance
Data governance is vital to data readiness. Modernization projects succeed or fail on the strength of their planning and governance efforts. With data proliferating across and outside agencies, agency CIOs need to have to governance and control functions in place to manage complex projects. Sharing agreements between source agencies and project teams are needed to ensure everyone knows how data will be collected and used.
Conducting a firm inventory of various data sources is a good first step. Until agencies know the data they have, IT leaders cannot proceed with a clear vision. Once IT leaders understand the data sources, they then need to develop a catalog to organize the data.
The primary focus of data governance today is to define rules and policies for protecting and securing sensitive data, such as PII, as it is defined by common data privacy regulations. The government plays a pivotal role in securing citizen data and many other sensitive data sets, so governance practices and technologies must monitor data use to make sure rules are followed when data is collected, moved, copied, analyzed and shared.
The mission of federal data and information security leaders is also to improve trust in data. Agencies should designate data stewards, like Chief Data Officers, who can guide users of self-service technologies to apply governance and data quality standards. CDOs can mentor users in following governance policies as they work with, share and use data in analytics models and visualizations.
Modernize pipelines for analytics workflow automation
Mission-critical analytics and AI/ML workloads put stress on agencies to scale up and manage numerous data pipelines for provisioning workloads with ready data. Data pipelines ingest data from many sources and deliver it to target locations such as a data lake, data warehouse or analytics platform. Some data pipelines simply stream raw, unstructured data into a data lake; others involve complex data preparation workflows that include data cleansing, transformation, and enrichment before data sets can be operationalized.
To help agencies scale, workflow automation can orchestrate not only a higher number of workloads but also complex and interdependent processes for data cleansing, transformation, and enrichment within data pipelines. Workflow automation technologies enable agencies to develop data pipelines for provisioning numerous analytics and AI/ML workloads in an organized and repeatable way.
Additionally, IT administrators can use workflow automation to find and fix errors more rapidly in pipeline preparation processes. Workflow automation will be increasingly critical as government decision makers use AI-driven recommendations to meet mission goals.
The Federal Data Strategy was launched to create a focus on data and infrastructure for the future. As the federal government—from the DoD to civilian agencies—looks to strengthen the role of data as a strategic asset, focusing on these strategies will help IT leaders confidently access, secure and distribute the information necessary to make strategic decisions and operate adeptly, while providing a more seamless and connected government experience for citizens, partners and employees of federal agencies.
Accelerating data readiness at agencies requires the right balance of integration, governance and workflow automation. By focusing on these strategies, federal leaders will ensure their data is easier to discover and prepare.