The Department of Health and Human Services is turning to big data to improve the security of their computer networks.
At the same time, HHS is striving to make more data accessible to the public and across all the agency bureaus.
Kevin Charest, the chief information security officer at HHS, said balancing these two mandates requires the ability to explain to the program managers why data security is important beyond the typical “because it is” response.
“Typically security folks tend to talk about it from a confidentiality aspect, protection and that sort of thing, and unfortunately that tends not to resonate with actual end users, whether they be producers of data or consumers of data,” Charest said after his spoke on a panel discussion Thursday at the AFFIRM/GITEC Colossal Data conference in Washington. “They are really more interested in the integrity, from a research standpoint, and the availability, I want what I want when I want it type of thing. Particularly as it relates to HHS, we have a tremendous amount of research data, and very often scientists have a tendency to see security as a nuisance and getting in the way of science. But the reality is what we found is by going to them and having perfectly open and honest conversations about what would happen if the integrity of your research was to be comprised. You begin to relate it to their world view, and all of a sudden security becomes less of an evil and more of a necessity.”
He said if you relate the security to the needs of the individual, it’s much easier sell.
Like most agencies, HHS creates and holds a lot of data, especially the sensitive kind, whether its personnel or health information, and there is more pressure to share data among bureaus.
Charest said as the data sets get bigger and bigger, the normal type of processing just doesn’t work.
APIs to Commerce data
Additionally, the Office of Management and Budget’s mandate to make information more accessible through Data.gov in a machine-readable format is adding another layer of complexity for agencies. President Barack Obama issued an executive order and OMB followed with implementation guidance in May.
The Commerce Department is trying to comply with that mandate by developing application programming interfaces (APIs) that will make data accessibility easier.
Simon Syzkman, Commerce’s chief information officer, said at the event that the department has made more than 100,000 data sets available through Data.gov. OMB plans to revamp the website later this year, and already has included a list of APIs available to users.
He said Commerce needs to improve the governance of data by building it in early on from a lifecycle perspective.
In addition to individual data sets from all the agencies, the administration is funding research and development on how to better harness big data.
Fen Zhao, a staff associate in the Directorate for Computer and Information Science and Engineering at the National Science Foundation, said her agency and the Office of Science and Technology Policy issued a request for information earlier this year to see how multi-stakeholder partnerships could come together to solve big data problems.
Zhao said NSF and OSTP will announce a new round of research and development projects this fall.
Mash-ups offering benefits
While the administration focused on R&D of big data, several agencies are mashing- up data to improve how they meet their mission.
The Agriculture Department is combining data on wildfires with information about crops to better understand the path of the fire. Then the Forest Service can redirect the blaze away from the corn or soybeans or other farm land.
Charles McClam, the USDA deputy CIO, said the agency is developing a big data strategy to decide which systems would benefit the most from analytics technology and mashing data together.
He said the strategy could be completed in the next 3-to-6 months.
HHS’s Charest said the agency is trying to break the data silos that currently exist.
He said there is great value in bringing together information from, for example, the Food and Drug Administration, the Centers for Disease Control and Prevention and the National Institutes of Health.
“We recognize the value of unlocking the data and bringing it together. So, what we have to do is not just take the underlying siloed security, but overlay a level of security for the interoperability and collaborative nature of the infrastructure,” he said. “So we are designing it now. We are looking at those pieces and deciding what goes into that mash-up and what doesn’t make sense, recognizing it will be an evolutionary and iterative process, but security is being designed up front before we do any data collaboration.”
Charest said HHS created a service catalog approach that lists specific security tools and processes that others in the agency can use.
“We augment our operating divisions, who have more or less budgets in this particular area,” he said. “We want to maintain a certain minimum benchmark across the agency. It’s been very effective. It’s a more efficient way to spend dollars. It also allows folks to focus on their core competency instead of trying to be everything in every situation. They can rely on us when they need that type of specific expertise.”
Security is a big data problem
Charest said big data also is making HHS’s networks and systems more secure through its security operations center (SOC). He said the SOC is a central coordination center bringing together threat data from across the agency.
“The beauty of that is it has allowed us now when we see something happening at NIH, we can span the entire network to see if it’s occurring anywhere else,” he said. “Previous to that inception, it wasn’t possible. It was more relationship based. Now it’s more automated and more real time so we can act more aggressively to deal with that.”
HHS paid for the SOC through Recovery Act funding.
Part of the SOC’s real time capability to see what’s happening on HHS’s networks is through its deep dive analysis team.
“What we recognized is we are getting a lot of alerts, a lot of metadata alerts, data about data. What we wanted to do was look for that needle in the haystack, that anomalous pattern that some of the bad actors utilize,” Charest said. “To do that, we really had to look at our security data in a big data kind of way, and bringing it all together, but now applying those types of analytics to it as opposed to the more traditionally operations.”
He said the mission of the deep dive analysis team is to identify the tools, techniques and procedures, and indicators of compromise for those specific actors.
“We are up to the point now where we are identifying somewhere in the neighborhood of about a 1,000 a month that we are then utilizing to inform our tools so we prevent them on the front end,” Charest said. “What that 1,000 a month is what we are producing for our networks, for our sensors to consume as indicators. We are in essence creating a profile and then we are putting that profile in the system so our sensors can recognize it and stop it before it ever starts.”