Agencies should apply DataOps to their data for AI, machine learning
August 20, 202111:43 am
4 min read
As federal agencies increasingly look to adopt artificial intelligence and machine learning for their missions and back-of-house business processes, they often hit one major early stumbling block: preparing their data, which can include data fusion from multiple sources, cleansing, transformation, validation and publishing. Agencies often have data stored in multiple silos and data lakes, making discovery difficult. In addition, that data is rarely in standardized formats, especially with regard to formats usable by AI and ML. But by applying DevOps principles to their data strategies, they can overcome this stumbling block much quicker, facilitating implementation of AI and ML tools.
Russell Dardenne, senior systems analyst at Geocent, refers to this new process as DataOps.
“In short, it’s bringing DevOps principles to the world of data,” he said. “It’s organizing data, cleansing it, making it usable, establishing security around it, hardening that data, all the things you would think about within a DevOps construct, when you’re delivering web services, features or applications. We’re taking those same principles of moving something up and down a pipeline, performing security and quality checks against it, individually delivering that data as usable.”
That also includes getting code changes from developers and promoting them into different pipelines for quality assurance before pushing it into production for users. Data engineers often spend a lot of their time acquiring the data, cleansing it, and transforming it into different work products that the different AI and ML tools would use. DataOps principles are about automating those processes to the extent possible, and breaking down silos that impede discovery.
It’s also about bringing the data engineers, scientists and AI/ML engineers together on the process to ensure changes happen efficiently, accurately and early in the process. Greg Porter, senior systems architect at Geocent, said that’s what really breaks down the silos and helps get usable data out to production for the end user.
That also makes the discovery process easier. Because the more data an agency gathers, the harder it becomes to discover what’s needed and standardized that into a usable format. Data lakes have a way of becoming data swamps.
“DataOps enables us to automate data governance throughout the pipeline. To aid the data discovery process, we’re able to automatically generate metadata from the data as it progresses through the pipeline,” said Brian Priest, senior systems architect at Geocent. “So, whether the situation is trying to find data for training models, or data scientists just need to browse for specific data sets, we’re able to provide a catalog of metadata that provides a standardized common language to query across the big data landscape.”
Organizations can also control the policies being applied in this pipeline at both the enterprise level and at the project level to account for things like classification and environment. That means organizations like the Defense Department and Intelligence Community can keep their classified data on premises, but still link everything together into an enterprise view in order to make the data more accessible to those who need it.
And all of this is powered by the cloud, providing scalability, flexibility and containerization as necessary.
“We containerize everything so we can scale it out as much as we need and ensure portability. And all that is controlled through a CI/CD pipeline. And then the release out to production is controlled through that pipeline as well,” Porter said.
Cloud is the integral piece that makes implementing these technologies and processes within reach for federal agencies. Because while this might be possible on-prem, it would be an extremely heavy lift to implement the kind of infrastructure needed. But cloud providers already have that infrastructure in place.
Just like with any cloud or DevOps adoption, the technology is only part of the lift. The other part is building a culture that’s receptive to changing the processes in order to implement. Dardenne said that starts with small, whitepaper-style proofs of concept, because the underlying concern with any big organization “change” is how it will interact within the existing security posture.
“If you can show a case, a piece of success early on in a project, it really helps the cultural change, because the government sees that, ‘Hey, this is accessible to me, we can do this, we can get success out of this,’” Porter said. “So then they’re more willing to invest in the changes that need to be made to enable the whole picture of piecing all this together and making it work. But if you if you start with something very small, and give them that taste of success early on in that project, we’ve seen that that really does help with that cultural change in the long run.”