Congress has given the Defense Innovation Board the gargantuan task of figuring out how to fix the ways in which DoD buys and develops software. The process is widely regarded as too slow, too expensive and reliant on long-outdated practices.
Insight by Okta: This exclusive e-book highlights how identity and access management will continue to evolve as agencies face more aggressive cyber threats while keeping data and systems accessible.
But beyond those general presumptions, one of the first things the board has found as it’s set about the project is that DoD keeps very little data about its own software projects. And since that’s exactly the kind of information one would need to pinpoint where the problems lie, the board has made collecting it its first order of business.
“We don’t really have any good ways of measuring software within the Department of Defense, beyond numbers of lines of code,” said Dr. Richard Murray, a board member who is a professor at the California Institute of Technology. “And every time I see the chart that shows exponential number of lines of code, I go, ‘Why are you measuring that?’ It’s a great way to get lots of lines of code. And so, what should the metrics be? What should we be measuring?”
During a quarterly meeting on Tuesday, board members began deliberating over a dozen different potential ways to track software development. They were based in part on the belief that implementing those new measurements will incentivize different behaviors in the government and contractor workforces which write software for the military.
Nearly half of the draft metrics — the early workings of a report the board will deliver to Congress next April — have to do with speed.
They would, for instance, require DoD development programs to track the time it takes from the start of a project to when developers deliver “simplest useful functionality.” They would also require tracking the time to remedy newly-discovered security holes, and how long the military takes to actually deploy new software updates into an operational environment after they’ve been written and tested.
Those metrics would also include “target values” for the department. The board suggests commercial-off-the-shelf software should see at least some operational use within a month after the program is launched. But highly-customized software that also needs to run on customized hardware, including embedded systems, could shoot for targets of less than a year.
“And the key is those numbers have nothing to do with the current three-to-five years, which is typical for the Department of Defense,” said board member Michael McQuade, a former United Technologies executive. “Fundamentally, this is an environment where spending a long time getting the requirements right, then spending a long time getting someone to agree the requirements are right, then spending a long time getting someone to agree that contractually I can meet the requirements is simply the wrong metaphor for how we go about doing software.”
Several other proposed metrics have to do with the quality of the software the military uses. How much automation are developers using to quickly verify their code is correctly-written? How many bugs are caught during the development process, rather than by end-users in the field? How often does DoD have access to the source code it’s paid contractors to develop so that it can inspect it, and roll back to earlier versions if necessary?
On the latter metric, the board said its review of DoD software programs has found that’s almost never the case.
“That’s because of the way we incent and contract with our vendors,” McQuade said. “And it’s our belief that you simply cannot own the delivery of code for a system – whether that’s a commercial system or whether that’s a customized or a real time embedded system – without the ability to go look at the code, to run metrics on the code itself.”
Another set of metrics are meant to improve DoD’s monitoring of the cost, schedule and performance of software programs, hopefully leading the department to make more accurate predictions of how long it will take to develop a particular capability once military officials decide it’s needed.
They would track questions such as the number of new software features developers are being asked to implement, what program management methodologies they’re using, how early and often they’re engaging with end-users, and how much of the program’s objective can be accomplished by re-using code that’s already written and owned by the government.
“One of the ideas would be to apply a machine learning model to these kids of statistics if we were collecting them,” Murray said. “That would give us predictions. If someone said ‘we want lots of features,’ that would allow you to both estimate what a program would cost, in terms of number of programmers and other things, but also audit whether the program is on track or whether it’s headed south.”
Former Google CEO Eric Schmidt, the board’s chairman, proposed what he said would be a simpler way to track the effectiveness of program management within DoD.
He said the board should develop a list of the 100-or-so most common programming languages, software development environments and hardware platforms in use throughout the broader IT industry over the past five years. DoD components, he said, should be graded on how often they’re allowing their developers to use those tools rather than long-outdated ones.
Want to stay up to date with the latest federal news and information from all your devices? Download the revamped Federal News Network app
“And that would suggest, for example, that anybody who’s using a VAX architecture would get a zero on that scale,” he said. “I think using a list of what’s been current in the last five years is easily audited. It’s easily inspected. If the last use of something outside the DoD was 20 years ago, these are factual things, so in a regulatory environment, you can survive challenges and you can tell people we’re going to grade you on that achievement. I can tell you that the majority of the teams that we’ve met with would not meet that criteria, but it’s a simple metric.”