NASA adopting Apache Kafka to enable real-time data from Mars

Big Data

NASA adopting Apache Kafka to enable real-time data from Mars

Federal government agencies are seeing an increased need to make quick use of real-time data. For NASA, real-time data also extends into the far frontiers via i...

Kevin McCaney

September 8, 2021 3:12 pm

4 min read

Federal government agencies are seeing an increased need to make quick use of real-time data, in areas ranging from healthcare and processing benefits requests to managing fleets of vehicles and supporting military missions. For NASA, real-time data also extends into the far frontiers via its Deep Space Network (DSN). It’s another example of how agencies are using Apache Kafka to set their data in motion.

The DSN is rooted in NASA’s array of three massive radio antennas spotted equidistantly around the globe. One of these antennas is located in Gladstone, California. The other two are located in Madrid, Spain and Canberra, Australia. But its reach extends far into space in support of the space agency’s interplanetary missions (as well as a few missions in orbit around Earth). Operated by the Jet Propulsion Laboratory (JPL) in Pasadena, California, DSN collects the feeds from missions such as the ones working on the exploration of Mars.

Although some commands and data can be transmitted directly between Earth and NASA’s rovers, the mass of information being generated first needs to go through the Mars Relay Network. That network is comprised of a constellation of orbiters collecting scientific data from the Perseverance rover, which touched down in February, as well as the Curiosity, which has been exploring the red planet since 2012, and the InSight lander, which has been operating there since 2018.

For most of the agency’s space exploration history, NASA’s workflows from space were slow, even accounting for the transmission times required across tens of millions of miles. The building and launching process for satellites and probes happens over a long lifecycle, so missions often went into space with some technology that was already out of date. The DSN has always been reliable, but it couldn’t accommodate the massive growth of data or the real-time use of it.

That’s changing now because of JPL’s use of Apache Kafka, which enables the sharing and use of data as it’s created.

As NASA’s space mission continues to expand, the amount of data it collects grows exponentially.

“We’re going to be getting higher data rates from spacecraft, more information into the network, more information that needs to be monitored, and we’re going to need a way to deal with this information, and parcel it and make sense of it very quickly,” Rishi Verma, a NASA-JPL data architect, said at a recent Kafka Summit in San Francisco.

In addition to notable missions such as the exploration of Mars, the moon and the outer solar system, NASA is also collecting information from a growing number of other, smaller efforts, including CubeSats. JPL has been aiming to put all that data into a single picture.

Real-time pictures from the great beyond

Apache Kafka handles data in a portable format, allowing it to interact with any system either generating or receiving data, including logs, third-party apps, custom apps and microservices, as well as legacy databases where information often is otherwise trapped. A global fabric of Kafka clusters allows for real-time sharing, event streaming and other combinations of both real-time and historical data.

Confluent, the open-source company behind Apache Kafka, has partnered with agencies to work with data in ways that previously weren’t possible, including responsive citizen engagement, real-time situational awareness, anomaly detection, event-driven missions and security operations.

For NASA, JPL is using Kafka to collect real-time information from its missions that can help monitor their progress and identify elements affecting complex missions. With the Mars Reconnaissance Orbiter, for example, the DSN will receive direct evidence when the orbiter passes behind the planet, getting real time equipment status and seeing any differences between NASA’s predicted model and what actually happened, which could reveal any degradation of equipment.

In the DSN, raw data often comes into the system in a proprietary format. That data is now transformed by Kafka so that it can be shared. It helps NASA keep its packets small, improves data rates and eliminates latency in the network. It allows JPL to get fault alerts, which helps missions avoid downtime and improve efficiency through automation.

NASA is also using information from Kafka to combine its data streams into a visualization system that presents a clear picture of what’s happening across all ongoing missions. They can then use that information to detect anomalies while giving direct data access to all DSN staff.

Data in motion unites agency efforts

Apache Kafka allows for the constant cycle of creating, consuming and sharing data, which has increasingly become a priority for federal agencies working under the Federal Data Strategy. It can be applied in the far reaches of the solar system as well as on the more terrestrial efforts of most agencies.

For example, Confluent recently partnered with the Centers for Disease Control and Prevention. Kafka helped the agency pull together disparate data feeds on COVID-19 from around the country and integrate them into a holistic national view.

No matter how it’s used, Kafka can help agencies fulfill the consistent goal of getting the right information into the hands of the right people at the right time.

NASA adopting Apache Kafka to enable real-time data from Mars

Related Stories

Data in motion optimizes management of GSA’s vehicle fleet

Bureau of the Fiscal Service explores ‘data lakehouse’ concept

Air Mobility Command learns to provide support ‘at the speed of data’

Upcoming Events

Related Stories

Top Stories