A unified approach to data management enables safe, reliable training and deployment of your organization’s artificial intelligence applications, he shares.
Training artificial intelligence models properly requires above all the right data. So it makes sense that an agency needs to establish an enterprise data governance and management strategy to improve confidence in deploying AI.
The data strategy should cover data regardless of format or where it is stored in hybrid cloud environments, said Tej Tenmattam, principal solutions engineer at Cloudera Government Solutions.
“Data relevant to AI models are scattered across on-premise databases, data lakes in the cloud and data in several departmental applications,” he said during Federal News Network’s Industry Exchange Data 2024.
What’s more, Tenmattam added, “this fragmentation makes it difficult to gather and prepare the data needed for training and then deploying AI models.”
Data an agency might need for AI likely exists in a garden variety of formats, and that datasets likely come with different degrees of quality and reliability too.
Overlaying the specific data requirements are security needs, Tenmattam said.
Agencies must deploy safeguards such as role-based access controls, data encryption in transit and at rest, and compliance controls for personally identifiable information.
Moreover, Tenmattam advised, “you need data quality tools so you are able to do data profiling, data cleaning, lineage tracking and so on.”
It’s a tall order because poor or poorly managed data will produce predictably bad outcomes in artificial intelligence projects. As Tenmattam put it, “Data quality issues like incomplete, missing or old data … reduces the accuracy as well as reliability of the AI models that you’re building. Poor data quality and accessibility lead to biased and inaccurate or underperforming AI models.”
An agency needs a unified data strategy, a data fabric approach that encompasses integration, quality, security and governance, he said. The strategy should “support data discovery, how you provide access and how you share the data.”
Cloudera’s data platform encompasses many of the tools necessary to manage data in this manner, Tenmattam said, such as NiFi Expression Language, Kafka connectors, Flink stream processing engine and Hadoop-Spark combinations for processing management.
“You need a data platform that is built on next-generation, cloud-native technologies,” such as Kubernetes, he said. The platform must “scale on AWS, Azure, Google Coud or in your own on-premise environment. A multicloud strategy and the ability to easily scale are very important.”
A comprehensive approach to managing large datasets from multiple sources supports not only AI but also applications that combine AI with near real-time data analytics. He named aircraft maintenance, fraud detection, supply chain optimization, traffic monitoring and crime detection as examples. What these use cases have in common is the need to combine large amounts of data quickly from multiple sources. The most useful applications can flag anomalies as they occur, adding to an organization’s knowledge base for later analysis, Tenmattam said.
A solid data strategy can help an agency prepare for where the AI industry is heading too, he advised.
He said users will increase what he called a focus on explainability of artificial intelligence, “understanding how AI models arrive at decision. That is crucial for building trust.”
The rise in use of generative AI will accelerate, Tenmattam said. Output in the form of new datasets from large language models won’t be limited to text but will also include images and software code.
“Responsible AI development is extremely important to future trends. Ethical considerations are becoming paramount in AI development and deployment.”
Discover more tips and advices shared during Industry Exchange Data 2024 now.
Copyright © 2024 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.
Tom Temin is host of the Federal Drive and has been providing insight on federal technology and management issues for more than 30 years.
Follow @tteminWFED