How HHS aims to keep AI-enabled medical devices working as they should

Artificial Intelligence

How HHS aims to keep AI-enabled medical devices working as they should

The DA has approved nearly a thousand AI-enabled medical gadgets. The question must be asked of how to keep these devices working properly over time.

Tom Temin@tteminWFED

September 17, 2024 1:22 pm

8 min read

Artificial intelligence has penetrated deeply into the world of medical devices. The DA has approved nearly a thousand AI-enabled medical gadgets. Given that nature of AI, the question must be asked of how to keep these devices working properly over time. That’s a question the Advanced Research Projects Agency for Health seeks to answer. For details, the Federal Drive with Tom Temin spoke to ARPA-H program manager Dr. Berkman Sahiner.

Interview transcript:

Tom Temin Dr. Sahiner, good to have you with us.

Dr. Berkman Sahiner Good to be here, Tom. Thank you.

Tom Temin And before we get to the funding mechanism and the programmatic aspects of this, tell us, in your words, what the problem is you’re looking at or the challenge with medical AI devices or AI-powered devices.

Dr. Berkman Sahiner So, as you mentioned, currently, there are many medical decision support tools that are in the clinics aimed at helping clinicians make better decisions and diagnoses in their clinical work. And typically, the performance of these models are evaluated with independent test data before they go on to market. This is, of course, very important for the FDA if the tool is under FDA’s jurisdiction as a medical device. But it’s also important for clinicians, even for tools that are not FDA devices, to know what kind of performance to expect from the tool. So this is before the device goes on the market. But once the device goes on the market, as we know, the clinics are very dynamic environments. So things change. For example, the data into an AI model might be data coming from a CT scanner, and that clinical site might buy a new scanner that was not in the training or test datasets before. So now how the model, AI model, performs with this new scanner that the data is coming from is a question that is typically not studied with in the clinician as the as time goes on. So the purpose is to maintain or even exceed that peak performance that the model has when it goes into the market.

Tom Temin And let’s back up for a moment. There’s almost 1,000 devices that the FDA has approved. What do these devices do? Are they like surgical robotic machines or are they heart monitors? I mean, give us a sense of what these devices do. Devices is kind of a broad word.

Dr. Berkman Sahiner Right. So these devices, they perform a variety of functions. They can be virtual assistants to devices that have enhanced medical images and diagnostic tools. So there is a wide range which is led mostly by applications in radiology. We see about three quarters of such devices are in the area of radiology. And then there is a sizable portion that is intended for cardiology and also other specialties such as pathology.

Tom Temin And we should point out your program has a long name performance and reliability evaluation for continuous modifications and usability of AI. Precise, I will call it. And you’re reaching out to industry for this with a funding opportunity. Tell us about that.

Dr. Berkman Sahiner Yes. So we are reaching out, of course, to industry because they are typically the manufacturers of these AI enabled tools, but also to academic institutions and clinics to really work together to achieve the goals of these of this program, which is, again, to first of all, monitor the performance of these models in the clinic, but also to understand when that performance starts to degrade and then to understand ways of getting to the root cause of the problem and suggesting automated tools to mitigate the performance decline. So there are a lot of steps, and we believe that really you need a collaboration between the AI device manufacturers and academia and clinics and others to achieve this goal.

Tom Temin We are speaking with Dr. Berkman Sahiner. He is a program manager for the Precise AI program at the Advanced Research Projects Agency for Health. And do you envision people running, say, workloads that come from real life against the machines but not actually in a clinical setting and doing comparisons? Is that the kind of work you envision?

Dr. Berkman Sahiner Yes. So the vision is that we would really like the data to be collected in the clinic as the device is running. But it may not be a device that the clinician really takes into consideration when they’re making their diagnosis, because to be able to do that, we have to have a device that’s really authorized or that’s really working in the clinic. And we would like to have both prototype systems as well as established tools to be used. So we are envisioning a system where the device is running in the clinic and you gathered the adult or the output of the AI model and compare it to a ground truth as well as it can be established in a scalable manner. And to see during that clinical use, that’s how the performance is changing over time.

Tom Temin And to qualify for funding, what do you need from the organization? What do they have to demonstrate they can do?

Dr. Berkman Sahiner Well, we have different technical areas that we have defined in our program. So the station that’s already out and there are different technical areas and the first technical area is to be able to consistently and continuously develop a surrogate grant through it. That can be compared to the AI model output. So we would like the proposers to be able to show that their they have a good method to be able to do that consistently across many electronic health records from which this surrogate grant information will be extracted and to be able to have enough diversity in their clinical sites, where they from which they gathered information so that our program helps all Americans. And the second technical area is performance monitoring and automated root cause analysis and mitigation. And again, in this technical area, we would like the proposers to be able to show that they can sort of distinguish between natural variation and a performance decline and propose ways of understanding what the root cause of performance degradation is. And we have other technical areas that I can go into as well.

Tom Temin Well, I think we get the idea for for my audience here. And I guess my question is, can the the clinical situation of a device contribute in a way that can make it better or worse? Suppose you have a lung imaging apparatus. I’m just making this one up. And it’s in Pennsylvania. So everyone that comes in as a former coal miner. And so the machine sees coal miners’ lungs a thousand times but hasn’t seen a lung that was not a coal miner in a thousand cases. That’s the kind of thing that can throw the machine off.

Dr. Berkman Sahiner Exactly. Yeah. But perfect example for what the program wants to do. So I mentioned that performance decline over time, but it can also be a performance decline or performance change among different clinical sites. So like in your example, the device may have been trained with the general U.S. population and has a good performance overall for the average population. But when you go to Pennsylvania and you have coal miners and a clinical site using this device, there may see a degraded performance compared to the average. So then the idea is to be able to make those local modifications to the version that’s being run at that specific site so that now the characteristics of the AI model are tuned toward their particular patient population.

Tom Temin And the output of what it is that you fund, what are you expecting back from the organizations?

Dr. Berkman Sahiner Well, we have a number of metrics that we have defined in our program solicitation, so we would definitely like the performers to meet those metrics. But more importantly, we would like to have, first of all, a set of tools that come out of this program that will help any manufacturer or any developer to continuously monitor and maintain the performance of their model. And in addition, we would like tools or maybe devices that come out of this program that really go into the market and make a difference. So right now, we are we of course, have not started the research, but we are really thinking about how these developed tools may be transitioned into the market towards the end of the program.

Tom Temin Dr. Berkman Sahiner is program manager for the Precise AI program at the Advanced Research Projects Agency for Health. Thanks so much for joining me.

Dr. Berkman Sahiner Thank you very much.

Tom Temin Appreciate it. And we’ll post this interview along with a link to more about the medical devices project at federalnewsnetwork.com/federaldrive. Subscribe to the Federal Drive wherever you get your podcasts.