"Inside that synthetic version of the code, we inserted vulnerabilities for the competitors to find," AI Cyber Challenge program manager Andrew Carney said.
Cybersecurity and artificial intelligence are partners at the same dance. And, for a couple of years now, the Defense Advanced Research Projects Agency has run a project called the AI Cyber Challenge. No small effort, it awarded nearly $30 million in prizes to teams able to design AI systems to protect critical code. Program manager Andrew Carney joined the Federal Drive with Tom Temin to discuss more.
Interview transcript:
Tom Temin Mr. Carney, good to have you with us.
Andrew Carney Great to be here. Thanks so much for having us.
Tom Temin Now, this program, this challenge, tell us more about it. Is it a two year event or is it something you’ve done for a couple of years, yearly?
Andrew Carney It’s a two year competition that’s challenging the best and brightest in AI in cybersecurity to defend the software that underpins our modern lives.
Tom Temin All right. And who are the teams competing? They have funny names. Are they mostly academic, corporate or what?
Andrew Carney Well, we’ve got an interesting mix of competitors. We saw a really strong response when we initially put out the call for the initial competition, and our semifinal event saw 91 teams from around the world come to potentially compete. And the top seven that you see here are a good mix of both university affiliated and private company affiliated teams.
Tom Temin And you said from around the world, are people in other countries eligible for prizes from DARPA?
Andrew Carney Every competing team must have a U.S. entrant official, must be represented by someone from the U.S., but they could be affiliated with foreign institutions.
Tom Temin Okay. And let’s get into the meat of it, though. What is it they’re actually presenting to DARPA that you’ll choose from?
Andrew Carney The way the competition is structured, and we just we just finished up our semifinal event at Defcon this year, and the way that that event was structured was that the teams were presented with a set of five challenge programs, effectively real open source software projects that we had created a synthetic version of. So we’d taken that base real software and we created a synthetic fork. We effectively developed that real software in a synthetic manner so that it would be representative but not actually but suitable for the competition purposes. Inside that synthetic version of the code, we inserted vulnerabilities for the competitors to find as a way of testing their fully automated LLM empowered cyber reasoning systems. So the competitors developed these systems to both find those synthetic vulnerabilities and then patch those vulnerabilities, which we then validated and tested.
Tom Temin Well, let me ask you this, you mentioned LLM, large language model based kind of search and rescue, if you will, for these vulnerabilities. People have been finding, hackers have been finding vulnerabilities for decades in code without large language models. What’s the difference now with this particular set of code?
Andrew Carney So there are two major differences with this, with our approach. One is that the challenge of finding vulnerabilities at scale is one that today’s technology still can’t address fully. We still, it takes a tremendous amount of time and effort for us to find vulnerabilities in the software that we rely on, far more, far more time than we have sort of a population of experts to it to sufficiently address the. The other piece of this is that these same systems, after automatically finding a vulnerability, must emit a patch. They have to fix what they find. And we have strongly incentivized through our scoring algorithm the teams that are successful are the ones that can patch as well as find vulnerabilities. That’s a major asymmetry, or there’s not as much historically automated program repair or, you know, automated patching is something that has lagged behind our ability to discover vulnerabilities. And so this creates a situation where, especially in the open source community, where open source projects may be very highly used but not have the resources or the software development kind of expertise on supporting them to address large numbers of vulnerabilities that are found. So AICC is challenging the competitors to fix everything they found, which our hope is that that will then make automated discovery, automated vulnerability, discovery and repair something that we can leverage at scale.
Tom Temin We’re speaking with Andrew Carney. He’s the program manager for the AI Cyber Challenge at DARPA. And you said that there’s more vulnerabilities than humankind can actually find in any reasonable time in software. Therefore, the need for this automation through AI. How big were these blocks of code that you embedded the vulnerabilities into? And is that a factor in how easy it is to find them? For example, you know, if you put a vulnerability in 100 lines of code versus ten of them in 10 million lines of code.
Andrew Carney Absolutely. So, I mean, one of the one of the projects that we had at one of our challenge projects that we adapted for the competition is the Linux kernel, which has millions of lines of code. And we we didn’t use the entire kernel, but we used hundreds of thousands of lines of it as part of the challenge. And so these challenges were significant. They were large and there was a lot for the competitors to look through.
Tom Temin Sure. And what’s the criteria for success? In other words, do they have to find all of the vulnerabilities that you know about and fix them successfully?
Andrew Carney I mean, it’s a competition. So the competitors were their own bar. We did have a minimum viable, sort of, the expectation was that teams to be successful would need to find and fix find vulnerabilities in multiple categories or classes. There’s a taxonomy called the common weakness enumerations. Miter maintains them. And we picked two vulnerabilities specifically in the top 25 most dangerous software weaknesses or CWEs. And we we you know, we saw teams find multiple classes of CWE, which was important because they may have different strategies depending on the class. And then we saw them also patched as well.
Tom Temin Right. It sounds like this methodology could also help with so-called unknown unknowns.
Andrew Carney Absolutely. In fact, during the competition, we anticipated the possibility that a team could find a real vulnerability because we were using real code and adding synthetic code to it. There’s still plenty of real open source software under evaluation for the competition. And in fact, one of the teams did find a real bug in SQLite and we were able to report that to the maintainers and get it patched.
Tom Temin All right. So now you’ve had a down select of teams following Defcon. What happens next?
Andrew Carney So we are very excited right now that the results of ASC were, felt encouraging. We saw the teams find and patch vulnerabilities in multiple classes. And we’re also cognizant of the fact that this technology on the on the LLM and Gen AI side is rapidly evolving. So we are in the process of doing some experimentation and planning for finals. And we’re really excited to share more details about what the finals competition will look like in the coming months.
Tom Temin And the dollars available are pretty substantial relative to most federal challenge grants.
Andrew Carney Yes, I mean, I think that the problem and the impact are also substantial. So I think we’re in line with the scale of the problem and the need to address it.
Tom Temin And how do you know or ensure or what’s the prospect of someone using this methodology in reverse and maybe putting in vulnerabilities and using it against the world that’s trying to be safe?
Andrew Carney You know, this is a purely defensive competition. These systems that the reason over software, they are tuned and focused on discovery and patching. They are, that’s really where all of the focus is. I mean the the work in synthetic bug insertion from other institutions, like, we’re tracking it but we don’t see that as an issue for this competition.
Tom Temin And what’s the timeline? When will this all wrap up?
Andrew Carney We are very excited to have Defcon finals or have the ICC finals at Defcon in 2025 and soon after will be releasing the code from the winners as open source. So everyone will be able to benefit from the technology being developed under the ICC.
Tom Temin Andrew Carney is the program manager for the AI Cyber Challenge at DARPA. Thanks so much for joining me.
Andrew Carney Thanks for having me.
Tom Temin We’ll post this interview along with a link to more information at federalnewsnetwork.com/federaldrive. Hear the Federal Drive on demand. Subscribe wherever you get your podcasts.
Copyright © 2025 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.
Tom Temin is host of the Federal Drive and has been providing insight on federal technology and management issues for more than 30 years.
Follow @tteminWFED