‘Going FAR before going FAIR,’ while juggling multiple clouds

Federal Monthly Insights - Cloud Migration Strategy and Cloud FinOps

Wednesday, October 28, 2021

The National Institutes of Health (NIH) recently added Microsoft Azure to their other two cloud providers: AWS and Google Cloud. Some call those providers, “the big three.” NIH calls it a “partnership.”

“We fund about 2,500 research and academic institutions around the country, with about 300,000 researchers at those institutions,” said Andrea Norris, the NIH chief information officer and director of the Center for Information Technology.

NIH is able to utilize state-of-the-art data storage and compute capability. It is a crucial tool in accelerating the science.

“We’ve moved more than 100 petabytes of research data in just the last couple of years into the cloud, from programs that are being supported across many institutions,” Norris said on Federal Monthly Insights – Cloud Migration Strategy and Cloud FinOps.

“We’ve trained almost 4,000 people, and not only in just how to use cloud, but how to use cloud to do specific biomedical research activities. And the companies have been great and really partnering with us to tune their platforms, tools and training to our specific needs,” Norris said.

Working with AWS, Google Cloud and Microsoft Azure means interoperability is important but difficult.

“Andrea mentioned a couple of key terms that make up an acronym called FAIR: Findable, Accessible, Interoperable and Reusable,” Nick Weber, program manager of cloud services, said on Federal Drive with Tom Temin.

“We like to say that we’re going to go FAR before we go FAIR, because that ‘I’ part, the interoperability, is a really challenging one for lots of reasons,” Weber said.

“You have what normally would be highly competitive companies who want to differentiate themselves having to figure out ways and having the government to try to incentivize ways to allow data, tools and access across multiple clouds. When data exists on a program that I want to analyze, and my data are in one cloud, and another research program’s data on another cloud, how do we make that work? How do we do that efficiently and cost effectively, when again, the incentives aren’t necessarily there naturally, for that to happen?” Weber said.

“43 petabytes growing to well over 100 petabytes, now. That’s extremely large,” Weber said. “One petabyte is the equivalent of 20 million four-drawer filing cabinets full of text. And we have well over 100 petabytes. So to be able to work at that scale and then add the complexity of trying to do that across multiple cloud providers is really something that the research community as a whole, and NIH is looking heavily into to find solutions to enable that.”

Related Stories