Levine, who spoke on a panel at the recent AFCEA Bethesda Health IT day in Bethesda, Maryland, said scientists don’t have to worry about cybersecurity, or updating software or even the system management side of running a network when they use the cloud.
“If an investigator comes to me and says, ‘I need X,’ one of my lanes to look in is how we can utilize the cloud to stand up that compute resource and manage the security of it and manage access to it without having to have a physical box located somewhere on campus on a data center or in a lab,” Levine said.
The first step to get scientists and others at NCI to accept the cloud was to recognize they have to give up control of the physical technology.
“First, they wanted to have it. There is something about the bird in the hand thing, having a computer in your lab that you can see and you know it’s yours. The first step, which I’ve spent the last four or five years doing, was getting them to let us take physical servers and put them in data centers or stand up new servers for them in data centers,” Levine said on Ask the CIO. “Now that I’ve gotten them to make that leap, it’s super easy to take that next leap and say, ‘do you care where the computer is? Do you care where the compute resources are taking place?’ Maybe 10 years ago they would care because the data sets were big enough and the data links to campus were small enough that maybe it would take some order of magnitude longer to push data to the cloud or get data from the cloud. But now those issues are largely solved.”
He added now the NCI scientists can focus on the science and not on the IT infrastructure, and when there is a need for more compute power or storage, it’s easy to spin up a new cloud server.
Jeff Shilling, the acting CIO of the National Cancer Institute, said the cloud is playing a similar, but different role to improve collaboration and sharing of data sets.
Shilling said the NCI created the Genomic Data Commons in the cloud to make large data sets of genomic data accessible to a wider audience.
NCI, and the National Institutes of Health more broadly, is looking at this data commons approach for other data sets as well.
Shilling said one big lesson NCI has learned when it set up the Genomic Data Commons portal in the cloud is scientists and other stakeholders need to be trained on how to upload data and use the information.
“When we think about these cloud resources or any kind of compute resources, you really have to have that human component. It’s not just the technology they are lacking, but it’s the understanding of how to make it valuable,” he said. “We have a lot of data, many, many petabytes of data. It can’t be moved around very easily. So for the last four years, the NCI have been working on something called the cloud pilot, which is basically a way to do compute and have that right next to the data.”
NCI chose three groups of vendors to choose the best solution that needed to include identity and access management, storage and the fact the data is in different cloud providers.
“They’ve graduated from that pilot phase and now we call them the cloud resources. While we are funding them now to be in place, they will eventually be running by themselves so we will fund researchers and then researchers will have some sort of token so they can go and compute,” Shilling said. “That model is where we basically have a lot of storage and then compute, side-by-side, so researchers at small places can compete with researchers at large places.”
Shilling said the researchers could pick the cloud services that meet their needs the best, ensuring the vendors are competitive and meeting stakeholder needs.
He said the Genomic Data Commons and the cloud resources are a model for how scientific computing can be done going forward.