Agencies moving to a data in motion paradigm creates a ‘central nervous system’ for their missions
July 22, 20211:43 pm
5 min read
This content is provided by Confluent.
The federal government is coming around to the importance of data, and is in the process of making a significant amount of policy around how to fully leverage it. Not least among this new policy is the Defense Department’s five new “data decrees,” set out in a memo signed by Deputy Defense Secretary Kathleen Hicks in May. The memo, among other things, creates a data council for all DoD components to coordinate their activities, and to increase data sharing among the components.
Hicks wrote that the changes are “critical to improving performance and creating decision advantage at all echelons from the battlespace to the board room, ensuring U.S. competitive advantage. To accelerate the department’s efforts, leaders must ensure all DoD data is visible, accessible, understandable, linked, trustworthy, interoperable, and secure.”
There is an acute need to avoid the same mistakes of past data modernizations efforts. Attempting to create yet another centralized data lake with all an agency’s data would just be more of the same and adding to data sprawl as existing databases aren’t likely to be going anywhere. More importantly, data at rest in a datastore isn’t useful when it isn’t being queried, and isn’t driving decision making. Instead, DoD and other federal agencies need to start thinking about creating a central connective tissue that conducts their data across their organizations, and begin handling their data in motion as it is created and flows to those who need it.
“Data in motion is really about inverting that dynamic: rather than storing the data away in these silos where it’s static and asking retroactive questions, what you want to do is publish the data as a stream and constantly deliver it to the questions, or analysis,” said Will LaForest, public sector chief technology officer at Confluent. “Suppose I care about supply chain optimization, and I’m in the military. I want to constantly optimize whenever inventory arrives or leaves, when forces deploy, when production capability changes, updates are made to equipment capabilities, and hundreds of other events. All missions are really made up of thousands, millions, or even billions (when you are leveraging sensors) of these small events, things that are changing constantly throughout the day. So data in motion is about handling it as it happens, rather than just sending it to a database, and then after the fact, asking questions.”
It’s a matter of how data gets from one place to another within, and even between, organizations. Rather than storing it in one place and then having to go and find it to make use of it, Confluent creates a kind of central nervous system to automatically conduct data to where it’s needed. After all, if you touch a hot stove, the pain data isn’t stored in your hand until your brain goes looking for it to use it. That information is conducted to the brain automatically in order to drive action – namely, moving your hand. In this instance, why should artificial intelligences, to name just one use case, function any differently from organic ones?
Another use case where data needs to be handled in motion in order to trigger actions as quickly as possible is edge computing. Many federal agencies, especially ones with highly distributed missions that don’t always have reliable or constant connectivity, simply can’t wait on data collected in the field to travel back to a centralized storage and compute, be processed, and then have the results transferred back out to them. For warfighters on the battlefield, or Customs and Border Protection agents screening vehicles at a port of entry, the opportunity to use that data is already passed by the time it’s processed. And that’s if they have connectivity. Handling the data in motion can deliver results and trigger actions at the speed of conflict.
And that’s where the central nervous system comes into play. It determines what data is anomalous, useful and relevant in the moment, conducts that data from the sensors to the edge compute for rapid analysis and action, and sends the rest back to central compute to be stored and analyzed later.
“The idea is that one taps into some source of data and as changes occur, they are published as a sequence of events, and then any number of downstream consumers can consume it anywhere they happen to be. And this is really key, you are decoupling the producers of the data from the consumers. It’s what makes it operationally scale,” LaForest said. “That connective tissue of decoupling all the different actors within an organization, so they can all produce and consume independently is the only way to make it really scalable across a large organization when you are sharing data.”
Another relevant use case is cybersecurity. President Joe Biden’s recent executive order encouraged agencies to adopt a zero trust approach to cyber. But LaForest said the problem there is again the scale; a key component of zero trust is the monitoring and threat detection in observability data, but there are massive amounts being produced, and agencies have to figure out what to use and which tools to send it to. That’s where Confluent comes in.
“Doing the same thing we’ve always done and just calling it something different is not a solution,” LaForest said. “Making another data lake doesn’t constitute a data strategy. That’s one technique. But really, you need more than that.”