Technology

NASA JPL building models of its petabytes of data with artificial intelligence

Artificial intelligence and machine learning are critical to automating or extracting insight from the hundreds of petabytes of data generated by JPL.

Amelia Brust@abrustWFED

May 11, 2021 2:36 pm

3 min read

Artificial Intelligence and Data - May 11, 2021

Download audio

NASA Jet Propulsion Laboratory is capturing more data than ever in its history. That has Program Manager and Principal Computer Scientist Daniel Crichton excited about his mission: Using data to understand Earth, the solar system and beyond.

Artificial intelligence and machine learning are critical to automating or extracting insight from the hundreds of petabytes of data generated by JPL. From this planet, to the Mars rover and observatories touring outer space, machine learning and AI present immense opportunities, he said. In many cases, better tools are to credit – or blame – for more data.

“So our instruments are producing much higher resolution data. So we have cameras that – and we see these on our cell phones that are now producing things like, what 8K and larger types of images,” Crichton said on Federal Monthly Insights – Artificial Intelligence and Data. “And we push those images and those capabilities out to our spacecraft, so we have better and better high definition cameras. And part of our challenge is actually we’re bandwidth limited.”

Crichton said JPL is trying to build representation models of the data using artificial intelligence. To do so would require training those models against sets of information that NASA is interested in. The models could help NASA move from human-tested processes to compute-intensive processes.

Nevertheless, it can be hard to know when a training set needs updating. JPL requires a continuous process in that regard, because the organization does not know exactly what it is looking for.

“One of the big challenges we have in our world, in science is trying to build a representation training set that really can capture the totality of what we want to be able to discover in our data,” Crichton said on Federal Drive with Tom Temin. “We may discover anomalies in the data and identify new features that we want to be able to look for, and go back and update our training sets and improve our models. And so, it’s really an iterative process of trying to actually train our models, discover new things, and reclassify the kinds of information that we’ve actually even seen in the past.”

To do that, Crichton said it is important to have sufficient metadata. That will help different scientists work together to solve a problem. To this end, JPL works to set worldwide standards for planetary missions, and develop standard metadata structures.

“That’s been in coordination with other space agencies around the world to make sure that we can actually develop [an] open science infrastructure to be able to share our data, support the discovery and so forth,” he said. “And that means having, well-curated data – metadata catalogs, capturing the data and developing ways in which we can really interconnect our data to actually advance science as a part of humanity.”

Then there is the matter of housing the data and keeping it available. Physical data can reside in cloud infrastructures such as Amazon, Google and Microsoft’s Azure, and the metadata can point to those infrastructures. As JPL reaches the petabyte level of data, Crichton said, the organization needs a way to scale up storage, which, along with complexity is a major challenge for missions.

“This is going to become the permanent record of what we’ve learned from our missions,” he said. “And so it’s very important that we treat that as a long-term archive, that we put in good practices of how we actually do quality checking of that data, that we look at ways in which we can ensure the integrity of it long term – and that we really treat it as the golden assets of our space age.”