According to Nick Weber, the acting director of the Office of Scientific Computing Services with NIH’s Center for Information Technology, the high level of te...
To understand computing requirement for the National Institutes of Health, you need to have a sense of NIH’s scale. Each of its 27 semi-independent institutes and centers focuses research on a different disease. Five of every six of its $45 billion appropriation funds grants. Grants go to some 2,500 research organizations each year, representing 300,000 individuals. NIH networks move tens of petabytes of data every day.
According to Nick Weber, the acting director of the Office of Scientific Computing Services with NIH’s Center for Information Technology, the high level of telework in recent years only sped up NIH’s extensive work in cloud computing.
“We actually did have quite a few of our research programs, both at NIH and those that we fund as a funding agency, starting to look at cloud and starting to look at ways to share data more broadly or collaborate more expansively,” Weber said at Federal News Network’s Cloud Exchange 2023.
A bonus of established cloud skills and contracts, Weber said, surfaced when the pandemic sent people home, but NIH activity barely missed a beat.
For NIH, Weber said, the cloud enables an efficient way for researchers to share data sets and explore interrelationships among the pathologies they study. He said NIH maintains certain in-house computing facilities for specialized purposes as well as access to high performance computers, or super computers. Increasingly, he said, commercial cloud services providers offer computer clusters and node interconnections that simulate supercomputing, giving scientists more access to services supporting numerically-intensive, “bursty” applications.
Weber leads a program called STRIDES, which stands for science and technology research infrastructure for discovery, experimentation and sustainability. It provides cloud services to NIH institutes and centers from AWS, Google Cloud and Microsoft Azure.
“We say we’re making ‘strides’ in getting researchers, the tools and technologies that they need to accelerate research outcomes, and to accelerate discovery,” Weber said.
STRIDES operates under the Center for Information Technology (CIT) but is optimized for the activities in its name. Other parts of the CIT deal with administrative systems for functions such as finance or personnel.
Weber said that in a federated organization like NIH, “there are institutes that have moved far ahead and are leaning forward and wanting to do innovative things.” And some are not quite as ahead. He said having a common set of cloud services ensure “the benefit of not having to do something 27 times over.”
He added, “We can do a single network connection to the cloud. We can do standard cybersecurity controls … and all sorts of things to speed the overall [cloud] adoption across the entire NIH.”
Adoption includes use of cloud for high performance computing (HPC), a capability Weber said didn’t exist a few years ago. You could scale an application up to invoke many notes, “but you didn’t have the connections among them, the high speed interconnects,” Weber said. “You didn’t have algorithms that could take advantage of distributed virtual machines in the cloud that were not co-located. But that’s changing.”
He said NIH researchers need cloud alternatives to supercomputers, and that the commercial cloud industry has listened. He cautioned that cloud will augment, not replace, certain highly specialized computational machinery NIH will continue to invest in. But researchers who can’t wait in a queue for these services, “can go and use the cloud, if there are certain types of specialized services and artificial intelligence or machine learning that exist on the cloud that just rolled out that aren’t in our local computing environment.”
As this capability develops, it offers the chance to do intensive work – such as DNA sequencing, or simulating proteins and how drugs dock to them – at lower price points with dedicated HPC. Sometimes, Weber said, researchers could spend $10,000 on supercomputing time before they knew it. Experience with how different workloads work, combined with cloud economics, will make HPC more widely available.
The STRIDES program also provides cloud services to, so far, 1,500 research organizations to which NIH has made grants. Weber said that in general, extramural researchers use part of their grant money to pay for cloud use under STRIDES contracts. Weber said the discounts under STRIDES help conserve taxpayer grant dollars.
Cloud helps improve general order and speeds access to computation in some of the external projects, Weber said, and therefore help grant outcomes. NIH leadership is therefore interested in the progress of STRIDES, Weber said.
“Many projects are quite large,” Weber said. “They can be consortia of hundreds or even up to 1,000 researchers working together on a major activity.”
He added, “We follow up with people and hear that just the quick access to computational capabilities, the tooling, the ability to collaborate outside of one’s individual institutions’ walls in these large consortia – those are the things that we hear as the benefits of cloud.”
By the same token, Weber said, STRIDES keeps track of the programming and computing practices of grantees to ensure they don’t burn through cloud hours and grant dollars because of inexact applications.
He said a few “oops” moments are part of the cloud rite of passage. But the oversight “does result in not wanting to burn their grant money on extra computational cycles that they don’t have to.”
NIH has one other model for getting research organizations into the cloud, particularly those historically underrepresented in the world of grants, and with limited resources.
“These research institutions don’t necessarily have on-premises supercomputers. They don’t necessarily have all of the local capacity or even [data center] staffing,” Weber said.
So NIH offered supplemental grant funds specifically to move their intensive scientific applications to the cloud.
“That’s where we have to take an even more disciplined approach than in the first scenario, because in this scenario, it’s direct NIH money that we’re using for all of these groups,” he added.
Yet, Weber said, this model levels the playing field for research brainpower in tribal, racially diverse, or simply resource-constrained institutions so they can compete with the rich institutions in places like Boston, New York or Los Angeles.
Weber said he believes the STRIDES program has changed how its constituents think about the potential of computing, just as remote working, initially forced by the pandemic, helped people realize that location does not equal work.
“If somebody’s asking their counterpart, if they haven’t heard of [STRIDES], then people describing say it’s a way to get access to essentially somebody else’s computational data center and resources and services,” Weber said. “I think people have started to pick up on what that means, even if they didn’t necessarily know how computing happened before.”
Cloud has also caused NIH to develop edge computing, Weber said. Experiments at the various centers and institutes often employ instruments, such as imaging equipment, that generate data locally.
“And there will be some initial processing on the local network for that instrument, to bring the data together,” Weber said. From there the data may migrate to the in-house high performance computing environment “to do some initial processing to make sure you have the datasets that you can then move to the cloud.” Once in the cloud, there’s the opportunity for further processing or launching a new collaboration.
“So that’s a pattern that that we’re starting to see really emerge,” Weber said.
The general idea is to normalize data of differing sources and formats before combining them in the cloud.
“Then when it goes to the cloud as a unit, the data is available to the researchers,” Weber said. “One of the biggest challenges, I would say, in research is bringing different datasets together that were generated at different times, or by different people, or by different formats and different technologies.”
To read or watch other sessions on demand, go to our 2023 Cloud Exchange event page.
Copyright © 2024 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.
Acting Director, Office of Scientific Computing Services, Center for IT
Host, Federal Drive, Federal News Network
Acting Director, Office of Scientific Computing Services, Center for IT
Nick Weber is the Program Manager of Cloud Services at the NIH Center for Information Technology (CIT). He’s been supporting cloud computing, high-performance computing, and other scientific infrastructure activities to enable research within the NIH for 13 years. The past few years have been focused on the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative.
Host, Federal Drive, Federal News Network
Tom Temin has been the host of the Federal Drive since 2006 and has been reporting on technology markets for more than 30 years. Prior to joining Federal News Network, Tom was a long-serving editor-in-chief of Government Computer News and Washington Technology magazines. Tom also contributes a regular column on government information technology.