The Defense Advanced Research Projects Agency is looking to see how artificial intelligence might be applied to the creation and maintenance of software, especially software controlling physical systems. It awarded a contract to a company called GrammaTech to develop this idea. For what it is and why it’s important, Federal Drive with Tom Temin turned to the GrammaTech’s vice president of research, Alexey Loginov.
Insight by ServiceNow: IT practitioners provide insight into how they are creating a digital fabric by optimizing cloud and citizen services in this exclusive executive briefing.
Tom Temin: Dr. Alexey Loginov, good to have you on.
Dr. Alexey Loginov: Great, thank you. Great to be here.
Tom Temin: So precisely what are you doing for DARPA under this project?
Dr. Alexey Loginov: Great question, the focus of this program is to try to help subject matter experts maintain and modernize cyber physical systems more effectively. Cyber physical systems are systems where there’s an element of cyber software that controls physical aspects or hardware, such as maybe a smart thermostat, going up all the way to something like a nuclear power plant,
Tom Temin: What in the industry they call SCADA.
Dr. Alexey Loginov: Exactly. So SCADA is one of the important examples of CPS systems.
Tom Temin: And I guess there is a cybersecurity issue with the maintenance of the software too, does that come into the project?
Dr. Alexey Loginov: Absolutely. This specific project is focused more on modernization. But of course, an important element of modernization could be cybersecurity. So you try to figure out why the system is behaving, and if it’s misbehaving, you first need to understand what’s it doing, and how could it be attacked. So that could be an element. Absolutely.
Tom Temin: And when you say subject matter, experts couldn’t deal with this under this project, that is to say, as opposed to software coding people?
Dr. Alexey Loginov: Exactly as opposed to software, coding people. And maybe even what’s more complicated to get ahold of is the reverse engineers, and cyber analysts that would break apart a system, understand what it does now to help a physicist or a mathematician, make the system adapt, those are in short supply.
Tom Temin: And some of these systems go back quite a few years and the documentation and the people that originally developed it might be long gone. And so then that’s where the need for reverse engineering might come in.
Dr. Alexey Loginov: Exactly, exactly. The system was built, and in reality in software, when someone builds some piece of software, a few months later you have to look at it again, you ask yourself, what in the world did you do that a few months later. And when a system was in operation a decade or or multiple decades, absolutely, this becomes a serious issue.
Tom Temin: So you’re looking to develop almost an agent that could understand the software in some automated way. And then result in a methodology by which someone who’s not a coder could make adjustments.
Dr. Alexey Loginov: Exactly. So we are developing a kind of an AI and machine learning based system using a technological transfer learning where we try to analyze software. So we build many, many, many examples of math converted to source code, and then we try to reverse this process. We’re looking at a binary say, okay this must have been the collection of mathematical formulas implemented in this binary.
Tom Temin: Would you say binary just to find that for the lay listener.
Dr. Alexey Loginov: Binary is the actual executables, the actual zeros and ones, bits that finally run on the computer. So normally, code starts out being source code, then it’s compiled by compilers into binaries. And then we try to go from that final version back up to the math that led to the creation of the source code, and then the binary.
Tom Temin: So in some sense, you’re building a flashlight to look into what is now a black box.
Dr. Alexey Loginov: Exactly, exactly. Yeah, and binaries often do look like black boxes.
Tom Temin: How do you go about that? I mean, I think of artificial intelligence as algorithms that learn as they gather data. What is the basic architecture of what it is you’re trying to pursue?
Dr. Alexey Loginov: We are taking many, many, many examples of mathematical formulas, combining them in many different complicated ways, creating source code out of them, compiling that to binaries. And then we create the correspondence so that we can reverse this process so that in the future, when we see a snippet of binary code, we say, oh this looked like the reverse of this example that we had seen before. And this is an element of something known as transfer learning.
Tom Temin: And let me ask you this, a lot of agencies are dealing with modernization of code and systems that are not necessarily industrial controls or controlling the physical world, but are just logic systems that may have been programmed many, many years ago. Can this methodology potentially help with modernizing that type of code?
Dr. Alexey Loginov: Absolutely. The world is full of binary code that has been created many years ago, in many cases many decades ago. The DoD for instance, hangs onto systems that operate for decades and the people that have created that code are long gone. And the problem is how do you modernize it to take advantage of new resources available or much worse, new attack surfaces that are discovered?
Tom Temin: Sure. So what you have described as a way of maintaining it, and that is to say that non coders could maintain it and make changes. Could this be extended to replacing code with modern code in such a way that you don’t have that black box anymore?
Dr. Alexey Loginov: Absolutely. Much of the effort of GrammaTech is focused on binary analysis, as we call it. And an example is, for instance, finding that a component inside a system isn’t, let’s say, an open source component in which a CDE, a dangerous vulnerability has been discovered, we can find that fully automatically. In fact, we even have a commercial tool we started marketing recently called Century. And then we have technology that can allow us to snip that out and replace it with a more modern, safer version.
Tom Temin: Got it. Because I’m thinking of agencies like the IRS, which has assembler code in large quantities. And when they tried to convert it to another more modern language, all they got was kind of an emulation of it, but not something they could really easily maintain. And so that becomes the the roadblock here.
Dr. Alexey Loginov: Right. This can definitely be done better with appropriate research and more modern technology, yes, one can lift it to a higher level, replacing it potentially with more modern source code that does the same thing.
Tom Temin: Now your relationship with DARPA is a contract, it’s not a grant or a research type of project, you’re really going to deliver a system that can do this?
Dr. Alexey Loginov: Absolutely. So DARPA has had a significant focus on trying to accomplish things. And they’re focusing on providing contracts to ensure that the systems get built that actually can help people.
Tom Temin: Because in DoD, we’ve talked about cobalt, we’ve talked about assembler, and so on even old Java for that matters. That’s getting pretty long in the tooth in some of that. But in the Defense Department, they have languages, to my knowledge, still running like ADA, and Jovial and all these really obscure languages developed to control systems such as weapons systems and fire control systems. And all of this, are they looking, do you think, to preserve those, or to finally replace that logic with new code?
Dr. Alexey Loginov: I think in due time, I would expect that all of that will get replaced. Now, what’s interesting is that, given how long these systems will last, they need to replace it with so to speak, very modern CC++, but that ancient CC++ will then run 30 years from now. But the ADA is going away little by little and so is Jovial.
Tom Temin: Yeah, because what do they say the final pilots of the B-52’s, for example, are in their infancy. Now, they’re still on bottles and diapers. And so you got to think long term.
Dr. Alexey Loginov: Exactly, exactly.
Tom Temin: Anything else we need to know about this? It sounds fascinating. I mean, how do you go about this? Do you have people coding AI algorithms that will then look at code?
Dr. Alexey Loginov: One of the big focus points for us is actually to find representative samples of CPS systems, and then find lots of representative collections of mathematical formulas that it would like to play with, and then apply the training. So much of AI and machine learning is about finding a representative corpus of data and applying training on this ground truth, so to speak. The information you know for sure, that’s one of the key steps in applying AI and ML.
Tom Temin: You keep having to work the algorithm against the sample system until the binaries come out the same as the original. Or am I simplifying?
Dr. Alexey Loginov: Maybe slightly, but it’s close to accurate. You keep trying to apply to many different samples and you say, this mathematical formula compiles to this binary. Now, let me train a system to go through from that binary, it gets back that mathematical formula. And if it’s able to reverse the process through training on many, many, many different samples, let’s hope that the first time you see a brand new thing that you hadn’t experienced before, you’ll come very close, if not perfectly correct.
Tom Temin: Got it. And a final question. We touched upon this briefly, but are there cybersecurity enhancement possibilities in this technique?
Dr. Alexey Loginov: Definitely. It can find some potential escape hatches that shouldn’t be there, you can find that, let’s say a mathematical formula has some funny discontinuity, if let’s say it computes that if the temperature is below some number, something good happens, if it’s above something good happens, and in this magic hole, something scary happens. So this is just a silly example I can think of off the top of my head, but if you find those problems, they could be taken advantage of.
Tom Temin: Alexey Loginov is Vice President of Research at GrammaTech. Thanks so much for joining me.
Dr. Alexey Loginov: Thank you very much.