Conflating geospatial data at scale: A better, faster method of achieving mission objectives

Imagine being tasked with planning critical missions for an intelligence organization or other federal agency. It’s a daunting job with a lot at stake. Often, national security and lives are on the line, making it essential to have a wealth of geospatial data quickly to make smart decisions in multi-domain operations. The issue: it requires conflating enormous amounts of geospatial and other information from a wide variety of sources.

Getting beyond two datasets

Traditionally, geospatial data conflation has involved compiling and reconciling two different geospatial datasets covering overlapping regions. The goal is to combine the best quality elements of the two to create a combined, more accurate dataset. This has typically been done by data analysts—who, although smart and dedicated — can quickly become overwhelmed when trying to combine more than two separate datasets simultaneously.

The result is that mission planners only get a limited picture of the situation. Looking at two maps isn’t so bad, but imagine trying to flip through a radar signal table with hundreds of columns and thousands of rows. Now add in cell phone intercepts by tower positions. It’s not optimal for decision-making. When geospatial features (positions, edges, shapes, etc.) are combined with non-geospatial features (text descriptions, radar frequencies, phone numbers, and so on) the challenge of geospatial conflation increases, but the resulting output, if well made, becomes more valuable, too.

Geospatial data goes well beyond street maps. Buildings, roads, cell phone towers, radar emitters, oil wells or even natural features like ocean trenches are a few of the objects on maps these days that must be mastered across datasets and plotted. Moving objects such as ships, planes, trucks and automobiles, can be tracked and identified across datasets with similar techniques. Even humans, when carrying cell phones, wearing smart watches, or working on a computer can be considered geospatial objects.

With the volume and variety of geospatial data increasing, relying solely on humans for geospatial data conflation and analysis means a lot of time-intensive work. As a result, requests for information can take hours, when critical strategic decisions often need to occur in minutes.

Exploring more ways to conflate using technology

Due to the importance of conflation in spatial analysis, different approaches to the problem have been proposed, ranging from simple buffer-based methods to probability, optimization and machine learning-based models. The ultimate goal is to enable mission planners to combine an almost limitless array of data: signal intelligence, IOT sensors, city and street data, satellite information, building information or video from overhead drones. Planners also need to see how the picture has changed over time, and easily deal with different levels of detail as the mission unfolds and situations change.

The issue with some of the techniques used and tried for geospatial conflation is lack of scalability due to the need for extensive human involvement. Machine learning-driven data mastering with limited human involvement eliminates the scalability limitations that plague other methods.

The quality and quantity of data available to enhance situational analysis and readiness will only improve with the advent of joint all-domain command and control (JADC2) data sharing. The ability to conflate that data is also set to improve. Despite limitations of past attempts, new, more effective geospatial data conflation techniques are emerging that allow data analysts to better reach mission objectives by conflating billions of data points quickly and at scale. These new conflation techniques give decision-makers an optimal, “high resolution” view of the situation as opposed to a partial view of whatever the data is describing. Because automated systems can run on powerful servers—and work around the clock—different layers of information can come together in minutes versus hours.

Human-in-the-loop data mastering

Data mastering that combines machine learning-based models with human feedback as needed is the heart of these new solutions. They can easily integrate geospatial data with data from hundreds of other sources and in a variety of formats into a common view.

The key to better decisions is the clear picture provided once the geospatial and other data have been enriched and unified. With new systems, data can easily be conflated through schema mapping with descriptive tags, codes and numerical columns (such as elevation data) across dozens of sources. Map features can be categorized to a standard taxonomy of types for data unification. Adding to the benefits, new systems can probabilistically match features across data sources to automate edge matching, rubber sheeting, and other techniques, or just to spot differences, leaving only low-confidence cases for human review and input.

These systems can tackle other tough challenges typical of geospatial conflation. For example, identifying objects or signals that exist not in a known position but in a possible area can be tricky for a human analyst, but are tractable for machine learning-based analysis systems. If there is any question with the associations the algorithms suggest, humans can train the model to resolve ambiguous situations in an appropriate fashion, and the model will learn from their feedback. Another challenging issue that can arise with geospatial conflation is time-based uncertainty. Again, machines can readily analyze time tracks and identify overlaps, learning as they go to speed up the process each time similar data is encountered.

A final thought

Federal agencies are struggling under a deluge of geospatial data, and the ability to conflate it quickly has become a crucial missing requirement. Until recently, conflating and enriching geospatial data at scale has been nearly impossible, but geospatial data can now be transformed into a powerful asset. With cloud-native, automated conflation techniques that involve experts on an as-needed basis, organizations now have the opportunity to take full advantage of their geospatial datasets to achieve mission objectives in more scalable, faster, and smarter ways.

By Ed Custer, Senior Solutions Architect, Tamr

Related Stories

    Retired Air Force Brigadier General Greg Touhill, director, CERT Division at the Software Engineering Institute at Carnegie Mellon University

    A former federal cybersecurity chief is now helping from an academic standpoint

    Read more
    CMMC-ABKarlton Johnson

    An update on a big program to help cybersecurity of the defense industrial base

    Read more


Sign up for breaking news alerts