Big data, AI are inextricably linked

This content is provided by Teradata

By Patience Wait

The two technologies need each other. Artificial Intelligence. Big Data. Two phrases that have been buzzwords in the technology arena, as trends with great, but as of yet unfulfilled promise. To a large extent, that potential has been unmet up to now because neither technology was sufficiently enabled.

Big Data, for instance, has been present for years, ever since data storage capacities got large enough – and cheap enough – to keep massive quantities of information. As data stores grew, data organization became essential because it doesn’t matter if you have the data if you can’t retrieve and analyze it in meaningful ways.

Having the data, even well organized, wasn’t enough, because the next step was how to handle unstructured data, such as machine logs, social media feeds, video, audio, and XML-type data, and combine it with structured data, the kind of information that normally fits into a spreadsheet or structured database tables. Then new algorithms had to be developed to tease out previously unseen patterns and relationships, the role to be played by artificial intelligence.

Meanwhile, in the AI field the challenge was getting sufficient quantities of data to work with in order to develop useful algorithms. In early stages of development, AI projects had to use limited data sets, because there weren’t the vast, organized quantities of “real” data to crunch.

That limitation has been smashed to bits – and just in time.

The explosion of data – from social media, the Internet of Things, the Industrial IoT, and so on – is growing at exponential rates. According to a 2015 Forbes article on the growth of Big Data, by 2020 about 1.7 megabytes of new information will be created every second for every human being on the planet. By that year, the accumulated universe of data will grow to around 44 zettabytes – that’s 44 trillion gigabytes! Yet less than 0.5 percent of all data is ever analyzed and used, according to the article.

Now we are on the verge of true machine learning, a necessary foundation of AI.

“Machines have been involved in storing and analyzing data for a long time, but usually at the behest of a human,” said Alan Ford, Pre-sales Director for Teradata Government Systems. “Now we need the machines to learn to analyze data themselves.”

Machine algorithms require a lot of data in order to teach themselves how to analyze the data, Ford said. For example, in order to develop image recognition algorithms (which are less complex than facial recognition algorithms), a computer needs vast numbers of “training set” images. In order for an algorithm to identify an image of a cat, it must first “see” and analyze tens of thousands of pictures of cats, as well as images that aren’t of cats, and be told which is which, usually by means of a simple label or tag.  This is known as supervised learning.

But once that computer can identify cat images, it can teach itself much faster how to identify dog images, because it both has a learning process to draw upon and something to compare it to. Now imagine doing that at high processor speeds with terabytes, exabytes, even zettabytes of data.

So AI needs Big Data in order to learn. And here is where Big Data needs AI.

“Given enough training and enough data, the algorithms can start refining or improving themselves,” Ford said. “They aren’t sentient in the way human beings are, but they can identify new and different patterns, and synthesize whether something is a right or a wrong behavior. They can then invoke a reaction, a response, whether from a human or another machine.”

Think of the applications. For instance, in cybersecurity there’s a tremendous amount of information created by a government agency’s networks – not the actual traffic content, but the metadata. It would be impossible for a human to search all that information for a bad actor, Ford noted.

Through machine learning an AI computer could locate not only that bad actor, but identify any patterns of behavior, compare them to patterns shown by other bad actors, and determine if there are early warnings that can be incorporated into network monitoring. That might also lead to adapting network defenses in real time in order to safeguard the most valuable data, or to take other countermeasures, such as tracking the source of the attack.

U.S. Air Force jets may have hundreds of sensors monitoring engine performance, environmental conditions, structural integrity, and other factors. A single flight could generate terabytes of information. Combining all that data with comparable data from all flights can enable AI programs to develop “condition-based maintenance” programs, cutting maintenance costs and increasing flight availability.

There are thousands, if not millions, of such examples. There is at least one major consideration to take into account, however: Because there is so much data, already existing and being generated, moving it is problematic.

Ideally, what should happen is the data moves once – from the source that creates it to the storage that holds it, Ford suggested. This means that AI analytics engines should migrate to the data, rather than compel data migration. This offers significant gains, by orders of magnitude, for speed, efficiency, and cost-effectiveness.

As Big Data and AI empower each other, they also empower people. Mundane tasks, from data entry and data scrubbing to trying to parse thousands of records in search of outliers, can be delegated to machines, leaving humans free to be creative, innovative, and entrepreneurial.