Ask the CIO

Understanding the data is the first step for NIH, CMS to prepare for AI

NIH and CMS have several ongoing initiatives to ensure employees and their customers understand the data they are providing as AI and other tools gain traction.

Jason Miller@jmillerWFED

March 29, 2024 3:53 pm

6 min read

The National Institutes of Health’s BioData Catalyst cloud platform is only just starting to take off despite it being nearly six years old.

It already holds nearly four petabytes of data and is preparing for a major expansion later this year as part of NIH’s goal to democratize health research information.

Sweta Ladwa, the chief of the Scientific Solutions Delivery Branch at NIH, said the BioData Catalyst provides access to clinical and genomic data already and the agency wants to add imaging and other data types in the next few months.

Sweta Ladwa is the chief of the Scientific Solutions Delivery Branch at NIH.

“We’re really looking to provide a free and accessible resource to the research community to be able to really advance scientific outcomes and therapeutics, diagnostics to benefit the public health and outcomes of Americans and really people all over the world,” Ladwa said during a recent panel discussion sponsored by AFCEA Bethesda, an excerpt of which ran on Ask the CIO. “To do this, it takes a lot of different skills, expertise and different entities. It’s a partnership between a lot of different people to make this resource available to the community. We’re also part of the larger NIH data ecosystem. We participate with other NIH institutes and centers that provide cloud resources.”

Lawda said the expansion of new datasets to the BioData Catalyst platform means NIH also can provide new tools to help mine the information.

“For imaging data, for example, we want to be able to leverage or build in tooling that’s associated with machine learning because that’s what imaging researchers are primarily looking to do is they’re trying to process these images to gain insights. So tooling associated with machine learning, for example, is something we want to be part of the ecosystem which we’re actively actually working to incorporate,” she said. “A lot of tooling is associated with data types, but it also could be workflows, pipelines or applications that help the researchers really meet their use cases. And those use cases are all over the place because there’s just a wealth of data there. There’s so much that can be done.”

For NIH, the users in the research and academic communities are driving both the datasets and associated tools. Lawda said NIH is trying to make it easier for the communities to gain access.

NIH making cloud storage easier

That is why cloud services have been and will continue to play an integral role in this big data platform and others.

“The NIH in the Office of Data Science Strategy has been negotiating rates with cloud vendors, so that we can provide these cloud storage free of cost to the community and at a discounted rate to the institute. So even if folks are using the services for computational purposes, they’re able to actually leverage and take benefit from the discounts that have been negotiated by the NIH with these cloud vendors,” she said. “We’re really happy to be working with multi-cloud vendors to be able to pass some savings on to really advanced science. We’re really looking to continue that effort and expand the capabilities with some of the newer technologies that have been buzzing this year, like generative artificial intelligence and things like that, and really provide those resources back to the community to advance the science.”

Like NIH, the Centers for Medicare and Medicaid Services is spending a lot of time thinking about its data and how to make it more useful for its customers.

In CMS’s case, however, the data is around the federal healthcare marketplace and the tools to make citizens and agency employees more knowledgeable.

Kate Wetherby is the acting director for the Marketplace Innovation and Technology Group at CMS.

Kate Wetherby, the acting director for the Marketplace Innovation and Technology Group at CMS, said the agency is reviewing all of its data sources and data streams to better understand what they have and make their websites and the user experience all work better.

“We use that for performance analytics to make sure that while we are doing open enrollment and while we’re doing insurance for people, that our systems are up and running and that there’s access,” she said. “The other thing is that we spend a lot of time using Google Analytics, using different types of testing fields, to make sure that the way that we’re asking questions or how we’re getting information from people makes a ton of sense.”

Wetherby said her office works closely with both the business and policy offices to bring the data together and ensure its valuable.

“Really the problem is if you’re not really understanding it at the point of time that you’re getting it, in 10 years from now you’re going to be like, ‘why do I have this data?’ So it’s really being thoughtful about the data at the beginning, and then spending the time year-over-year to see if it’s something you should still be holding or not,” she said.

Understanding the business, policy and technical aspects of the data becomes more important for CMS as it moves more into AI, including generative AI, chatbots and other tools.

CMS creating a data lake

Wetherby said CMS must understand their data first before applying these tools.

“We have to understand why we’re asking those questions. What is the relationship between all of that data, and how we can we improve? What does the length of data look like because we have some data that’s a little older and you’ve got to look at that and be like, does that really fit into the use cases and where we want to go with the future work?” she said. “We’ve spent a lot of time, at CMS as a whole, really thinking about our data, and how we’re curating the data, how we know what that’s used for because we all know data can be manipulated in any way that you want. We want it to be really clear. We want it to be really usable. Because when we start talking in the future, and we talk about generative AI, we talk about chatbots or we talk about predictive analytics, it is so easy for a computer if the data is not right, or if the questions aren’t right, to really not get the outcome that you’re looking for.”

Wetherby added another key part of getting data right is for the user’s experience and how CMS can share that data across the government.

In the buildup to using GenAI and other tools, CMS is creating a data lake to pull information from different centers and offices across the agency.

Wetherby said this way the agency can place the right governance and security around the data since it crosses several types including clinical and claims information.

Jason Miller

Jason Miller is executive editor of Federal News Network and directs news coverage on the people, policy and programs of the federal government.

Follow @jmillerWFED

AI Data Exchange with State's Matthew Graviss, NIH's Susan Gregurick

AI & Data Exchange 2024: State’s Matthew Graviss, NIH’s Susan Gregurick on AI as force multiplier

Artificial Intelligence

Cloud Exchange 2023: NIH’s Nick Weber explains how STRIDES cloud program bridges 27 institutes

Cloud Computing

AI, Artificial Intelligence concept,3d rendering, conceptual image.

CMS untangles its data infrastructure to enable AI-powered fraud detection

Automation

Understanding the data is the first step for NIH, CMS to prepare for AI

NIH making cloud storage easier

CMS creating a data lake

Related Stories

AI & Data Exchange 2024: State’s Matthew Graviss, NIH’s Susan Gregurick on AI as force multiplier

Cloud Exchange 2023: NIH’s Nick Weber explains how STRIDES cloud program bridges 27 institutes

CMS untangles its data infrastructure to enable AI-powered fraud detection

ASK THE CIO

THURSDAYS 10 A.M. & 2 P.M.

Related Stories

Top Stories