The Federal Aviation Administration assured Congress Wednesday that it’s taken steps to avoid a repeat of last month’s air traffic meltdown. But the agency’s leader said it will take at least another two years before the aging IT infrastructure at the heart of the issue is fully replaced with more modern technology.
The FAA messaging system that issues Notices to Air Missions (NOTAMs) went offline on the evening of January 10th, and wasn’t fully restored until 9am the next morning. While it was offline, the FAA ordered a complete halt to all aircraft departures nationwide for nearly two hours — the first time the U.S. has seen a nationwide ground stop since the Sept. 11 attacks.
The NOTAMs system is important enough that it has multiple backups designed to take over when the main system fails. But the FAA said those backups failed too.
The agency said the root cause was a contractor inadvertently deleting essential files from the main database during a system update. And because of the way the system is architected, those file deletions immediately replicated themselves to the backup systems, rendering them unusable as well.
Billy Nolen, the acting FAA administrator, said the agency has since put processes in place to keep that precise problem from recurring.
“We have instituted a one-hour synchronization delay between the primary database and the backup database that gives us time make sure that we have no issues there,” he told the Senate Commerce Committee Wednesday. “Secondly, we’ve increased the level of oversight to ensure that more than one person is available when work or updates are being done on the live database, along with up leveling our level of oversight within the command center to ensure that we’ve got leadership present.”
But Nolen told the Senate Commerce Committee there’s no way to guarantee that some other problem won’t sideline the aging system in the coming months and years.
“When we think about the age of our systems, we do have redundancy there. Could I sit here today and tell you there will never be another issue on the NOTAM system? No, I cannot,” he said. “What I can say is that we are making every effort to modernize and look at our procedures. Part of this investigation has us working with the MITRE Corporation and other entities to look across the totality of our systems, how they interrelate, what the level of redundancy is, and is there any additional thing that we need to do. We’ll certainly have more to say as that investigation ensues.”
The NOTAM system, which distributes operational safety messages to every pilot and airspace user in the country, is actually two separate systems.
First, there’s the 30-year-old legacy version, which the FAA calls the “U.S. NOTAM System.” Since 2009, the agency has been working to replace it with a new version, called the “Federal NOTAM System.” About 80% of the aviation industry has already been migrated to the new system, but about 20% of users — including the entire Defense Department — is still on the legacy edition.
Because such a large proportion of the aviation community still relies on the old system, when it fails, it impacts the entire country, and Nolen said the FAA doesn’t expect to move everyone to the Federal NOTAM System until 2025.
But Nolen said even that schedule, and the rest of the agency’s IT modernization plans, depend on adequate and on-time appropriations and authorizations from Congress.
“It’s all about ensuring that we have the funding there, and we’ll look forward to what comes forward in the President’s budget. Our goal is to take every dollar that we are given and be good stewards of that as we move forward to modernization,” he said. “But we’re talking thousands of systems. NOTAM is a big one, but we don’t to leave the committee with the impression that we once fix NOTAM, we’re done.”
In the meantime, Nolen said part of the investigation into the January failure is looking at whether the NOTAM system should be formally designated as a “safety critical” system. As of now, it has a lower-level designation: “mission support.”
“We’re taking a look at the classification to make sure we’ve got it right,” he said. “Some of the differences are just the levels of engineering controls that you’d have in place for a critical system. Those also have added levels of redundancy that you’d expect to have given their criticality.”
But Congress wants improvements to the NOTAM systems sooner than 2025.
Legislation the House passed in late January would order the FAA to stand up a task force to look at near-term improvements to the resiliency and cybersecurity of the system. A similar bill is pending in the Senate.
Meanwhile, Sen. Maria Cantwell (D-Wash.), the Commerce committee chairwoman, said Wednesday the FAA needs to tell Congress what it can do right now to make the system more redundant.
“I want to get an answer within a week about the NOTAM system having a totally separate backup that could be used,” she told Nolen. “You’re now trying to put human redundancy there so that this won’t happen again. But if the same system is a network, including the backup servers in other places, whatever action somebody mistakenly takes on files still affects the whole system. Can the FAA set up a truly redundant system that would allow for the file corruption that happened not to happen across the entire system? That’s what we need to know the answer to.”