Artificial intelligence programs are multiplying like rabbits across the federal government. The Defense Department has tested AI for predictive maintenance on vehicles and aircraft. Civilian agencies have experimented with robotic process automation. RPA pilots at the General Services Administration and the IRS helped employees save time on repetitive, low-skill tasks. In February, President Donald Trump signed an executive order expanding his administration’s efforts to foster the research and development of artificial intelligence tools in government.
On the industry side, Chris Sexsmith, Cloud Practice Lead for Emerging Technologies at Red Hat, says it’s reached the point where companies are becoming more concerned with a second layer: It’s not only about leveraging AI itself, but also how to effectively manage the data.
“What are some of the ethical concerns around using that data?” Sexsmith asked. “Essentially, how does an average company or enterprise stay competitive in this industry while staying in line with always-evolving rules? And ultimately, how do we avoid some of the pitfalls of artificial intelligence in that process?”
But one of the biggest concerns right now is the “black box.” Essentially, once an AI has analyzed data and provided an output, it’s very difficult to see how that answer was reached. But Sexsmith said agencies and organizations can take steps to avoid the black box with Red Hat’s Open Data Hub project.
“Open Data Hub is designed to foster an open ecosystem for AI/ML – a place for users, agencies, and other open source software vendors to build and develop together. As always at Red Hat, our goal is to be very accessible for users and developers to collectively build and share this next generation of toolsets,” Sexsmith said. “The ethical benefits in this model are huge – the code is open to inspection, freely available and easy to examine. We effectively sidestep the majority of black box scenarios that you may get with other proprietary solutions. It’s very easy to inspect what’s happening – the algorithms and code that are used on your datasets to tune your models, for instance – because they are 100% open source and available for analysis.”
Open Data Hub is a machine-learning-as-a-service toolbox, built on top of Red Hat’s OpenShift, a platform for managing Linux containers. But it’s designed to be portable, to run in hybrid environments, across on-premise and public clouds.
“We aim to give the data scientists, engineers and practitioners a head start with the infrastructure components and provide an easy path to running data analytics and machine learning in this distributed environment,” Sexsmith said. “Open Data Hub isn’t one solution, but an ecosystem of solutions built on Openshift, our industry-leading solution centered around Kubernetes, which handles distributed scheduling of containers across on-prem and cloud environments. ODH provides a pluggable framework to incorporate existing software and tools, thereby enabling your data scientists, engineers and operations teams to execute on a safe and secure platform that is completely under your control.”
Red Hat is currently partnered with companies like NVIDIA, Seldon.io, and PerceptiLabs on the Open Data Hub project. It’s also working on the Mass Open Cloud, a collaboration of academia, industry and the state of Massachusetts.
But Sexsmith sees a lot of possibilities in this space for federal agencies to advance their AI capabilities. Geospatial reconnaissance, law enforcement, space exploration and national labs are just a few of the federal missions that could benefit from AI’s ability to process massive amounts of data in an open, ethical way.
“Federal agencies obviously have a lot of restrictions on how data can be utilized and where that data can reside,” Sexsmith said. “So in this world of hybrid cloud, there is a need to be cautious and innovative at the same time. It is easy to inadvertently build bias into AI models and possibly make a bad situation worse. Full control of data and regular reviews of both code and data, including objective reviews of ML output, should be a top priority. At minimum, a human should always be in the loop. And while the simplicity of a proprietary service is often appealing, there is danger in not fully understanding how machine-learning results are reached. Code and data are more intertwined than ever, and the rules around data and privacy are always evolving. Maintaining control of your data in a secure open source environment is a smart move, and a primary goal of Open Data Hub.”