Replacing mountains of code doesn’t have to take forever

Best listening experience is on Chrome, Firefox or Safari. Subscribe to Federal Drive’s daily audio interviews on Apple Podcasts or PodcastOne.

Nearly every agency technology modernizing effort runs into the same hill: How to replace legacy code. Often code written decades ago in obsolete languages. Now the Defense Advanced Research Projects Agency, DARPA, has launched a project to discover ways to replace this code incrementally but steadily. With the details, the program manager in DARPA’s Information Innovation Office, Dr. Sergey Bratus, joined Federal Drive with Tom Temin.

Interview transcript:

Advertisement

Tom Temin: Dr. Bratus, good to have you on.

Dr. Sergey Bratus: Thank you. Great to be here.

Tom Temin: And let’s start with what it is that you see as the methodology because what I’ve heard over the years is either just recode everything, start over, or somehow translate the code into new code, which makes it blossom into many, many more lines of code. And it just seems like this stuff persists no matter what people try to do. So what is the new methodology?

Dr. Sergey Bratus: First and foremost, the size of software is bound to grow. We want to automate more and more complex behaviors, and so that means more software. Now, the second point is that the replacing software is not so easy at all. Imagine a large system, imagine a system of many components that continually talk to each other. Now, how do you replace the entirety of that, without stopping some systems in our infrastructure, such as the power grid, you can never hope to stop completely. You can only replace it piecemeal, and with very small pieces, but therein lies the problem. When you’re replacing code in a component, your concern is not only about that particular component and what it does, but also about how it interoperate with the rest of the system. So it’s not just the million lines of code, or maybe several lines of code within that million lines. If you want to update it’s also all the other millions of lines and all the other components that you need to somehow account for interactions, and this is really the challenge.

Tom Temin: You’ve got a program called V-SPELLS, which stands for verified security and performance enhancement of large legacy software. And what would that do if you get this to operate?

Dr. Sergey Bratus: It addresses some of the missing capabilities that we must address in order to be able to place software incrementally. I mentioned understanding or the need to understand the interactions of a component with other components. But with a large code base, your very first question is, what are the components? How do you identify them? Sometimes it’s not actually an easy task to understand what the modulus structure of the system is. Sometimes you want to replace just one particular part of it. Sometimes you want to move or that particular part of it to new hardware and take advantage of the better or faster or more secure hardware. And that’s another challenge. Finally, it’s always a risk when you’re replacing software, when you’re editing software. The software that you have running on an infrastructure system has been tested in its entirety. Now, you make any small change, how do you assure that the small change doesn’t break the entire thing. Uou need automated support for understanding of the software that you have and its behaviors. And you want to protect the human making the changes from the typical errors that humans do when they program. It’s easy to, in fact, just look at the line of code, believe it does what you mean it to do and not see that it’s doing something completely different. Automation is needed to address all of these challenges. And V-SPELLS aims to provide the automatic analysis tools to help all of these.

Tom Temin: So how will you get V-SPELLS to work? Are you looking for industry help? Tell us the nature of this program that you have going.

Dr. Sergey Bratus: This is a combination of fundamental research. There are hard fundamental research questions waiting to be addressed here. But it needs to work for the industry because software is mostly written in industry. So yes, we’re looking for industry involvement and industry connection with some of the fundamental research, promising fundamental research that is happening in these software areas.

Tom Temin: So if I’m the IRS and I’ve got my however many millions of lines assembly code that I’ve been trying for 30 years to replace and can’t, I’m gonna have to probably wait a little longer before V-SPELLS gives me the operational tool I need?

Dr. Sergey Bratus: We aim to produce real tools. However, it takes time. We don’t, at this point, exactly know what these tools will be able to do, and we need better theories on how they will operate. But this is exactly what DARPA does.

Tom Temin: Why have so many previous attempts by many agencies to get rid of legacy code failed, just because of the challenges you outline — are they totally insurmountable?

Dr. Sergey Bratus: They are hard mathematical problems. They are hard engineering problems. So breakthroughs are definitely needed in the theory of how we construct our tools. But very often, it’s not economical to get rid of legacy code. Because legacy code represents a huge investment of labor and expertise. That previous generations of programmers have made. So your question could be, not how to replace it, but how to enhance it, and how to make it so that your enhancement would be safe, or as we say, safely composable with the rest of the system. There are places in code as any programmer knows that are safe to change, especially when the software has been constructed a nice and modular way. And there are places that are troubled, that are very unsafe to change, because the changes there could cascade throughout the behaviors of the system. So understanding the software and telling the first situation from the second is very important.

Tom Temin: So many of these legacy systems are subject to regular updating and functionality changes, say the tax code changes or eligibility for some program changes. So people are in it a lot and fiddling with it a lot, and yet it doesn’t seem to break it. So replacing it though then seems to be just a step beyond simply updating it or keeping it current with whatever the program requirements are.

Dr. Sergey Bratus: That’s right. But there are several different scenarios here. In one scenario, there is a local problem. Typically a security problems are of that kind, security vulnerabilities are of that kind. And essentially, you forgot to check something. You forgot to check for example, that a number is within certain bounds. So you add that check and the functionality of the rest of the system may or may not change, you may have misunderstood which check you were missing in the first place. But this is a problem of local changes when you are enhancing software changing function. That’s a different and harder problem, because now your functional change may propagate to other modules or might have to be done across modules in a synchronized fashion. That is a harded analysis problem and people really don’t have a good handle on how to do this without extremely extensive testing. And that testing might take months. And sometimes, if a system is high assurance, it might take years. So, our goal is to introduce the necessary automation to make it happen within weeks, if possible. Sometimes it could even happen within a day.

Tom Temin: This system then could scale such that it would continuously replace with all of these checks and balances and reassurances happening automatically so that you could actually have in horizon to have new code.

Dr. Sergey Bratus: That is correct. And what we call this is a safe patterns of composition. You compose new code with this system, sometimes even a running system and the network of your computer, and the tooling for your composition. And this is where we get to the heart of the program, what we call the domain specific languages, higher level languages that abstract the details. This method might allow you to write just one line where you have to write hundreds. And with that one line, it would be easier to check it, harder to make mistakes, and easier to merge it with the running system using special tooling or automation than in the legacy ways.

Tom Temin: The broad agency announcement is closed now. So you are in the state of waiting for people to respond?

Dr. Sergey Bratus: That was correct. In fact, the proposals have been submitted.

Tom Temin: So you’re in source selection. These would be for contracts or grants?

Dr. Sergey Bratus: Contracts primarily.

Tom Temin: Alright, well let’s hope it comes out well. When would you expect something to be known at this point?

Dr. Sergey Bratus: This is a four year program. I expect tools considerably improving the state of the art to start appearing after the first year when it starts, and I hope to accomplish demonstrations on real systems by the end of it.

Tom Temin: Dr. Sergey Bratus is program manager in the information innovation office at DARPA. Thanks so much for joining me.

Dr. Sergey Bratus: Thank you for having me.

Read more about the project here.