The amount of individual data bytes created is so enormous, it is fast out pacing the ability to capture, search, analyze or store it. Big Data is the general term used to describe the 2.5 quintillion bytes of data produced each day.
“On average, people are collecting almost 50 percent more data a year, so it’s just a lot more data coming at them every day,” said Mark Weber, NetApp’s president of the U.S. public sector.
“Big Data is just datasets that have gotten so large and complex that people don’t have the tools or the ability to capture it, store it, search it, retrieve it, analyze it,” Weber said. “They just don’t have the proper equipment or technology to do that.”
Weber spoke to The Federal Drive with Tom Temin and Emily Kopp about a Meritalk study called The Big Data Gap his company sponsored. The ever growing amount of data is creating challenges for agencies who are trying to mine it.
“If you think about it, Big Data is a gold mine,” Weber said. “What you have there is kind of your family jewels and you should be able to mine that data to make great decisions, to improve your efficiency … to improve forecasting. So, if you’re not doing that, you’re going to be totally inefficient. You’re not going to be able to make good decisions for your agency, for the intelligence world, for health-care decisions. Part of seeing all the information you’re collecting is allowing you to be more efficient.”
Weber pointed to intelligence agencies as one community that’s in the data collection business, and, therefore faces the problems surrounding Big Data on a regular basis. “Analyzing that data and the amount of that data they have to analyze is really important to protect our country and keep us free,” he said.
Likewise, improved data analysis and processing could help health care agencies keep down costs.
“Doing studies and determining why people get re-admitted to hospitals, could save massive amounts of data,” Weber said. “You could save tons of money by not re-admitting people. … They have all that data. The question is, are they analyzing it and figuring out why people go back?”
NetApp’s study found that about 40 percent of people are using the data they’ve collected and are making decisions based upon it, so there is plenty of room for improvement.
“A lot of this is innovation,” he said. “The amount of data that people have, the tools don’t exist today. This is a great opportunity for IT companies to innovate.”
In the past, data was broken down using relational databases, which relied on similar structures to analyze datasets. Now, much of the data coming in is unstructured, arriving in different formats, such as imagery and video.
“How you search and mine that kind of information is really pretty complicated,” Weber said. “That doesn’t fit nicely into a relational database. That’s the piece of data that is exploding exponentially is the unstructured side.”
According to Weber, the number one thing that agencies need to do right now is determine who owns the data and who’s responsible for mining it. In addition, agencies need to tackle the unstructured side of data analysis.
“A lot of tools exist for structured data,” Weber said. “They’ve got to figure out what investments they need to make in the unstructured side.”
To that end, the Obama administration recently announced the Big Data Initiative, committing more than $200 million in research-and-development investments.