How can federal agencies be expected to budget and plan for worst-case scenarios which are statistically unlikely to occur? It’s a difficult challenge but, as the coronavirus has demonstrated, a necessary one, hence why the Government Accountability Office published a Disaster Resilience Framework.
Vijay D’Souza, GAO’s director of Information Technology and Cybersecurity, said agencies have to consider their various business processes and what could impact them, then what can be done to offset those impacts and keep operations moving smoothly. A continuity of operations plan (COOP) should have HR, records management and payroll, but IT underlies all of those.
“They basically need to start out by identifying the risks to their systems,” he said on Federal Monthly Insights ꟷ Operational Resiliency.“I mentioned a natural disaster or it can be something like a pandemic ꟷ what you have to do is think about how each of those risks would affect each of the particular systems you have. Then what you need to do is you need to develop a mitigation strategy and figure out, you know, what would you do?”
Some options might be an alternate computing facility or cloud computing. But because cloud vendors have a variety of backup operations across multiple facilities, agencies must consider related staffing, D’Souza said.
“So, for example, if your staff were ill or unable to get to work, how would you run things? One of the things a lot of agencies are running into now is a lot of agencies are telework-friendly. There are certain IT functions that can only be done in the building, for example, maintenance, replacing equipment that fails, repairing things, applying certain patches. So that’s a bit of a challenge,” he said on Federal Drive with Tom Temin.
That’s why, D’Souza said, having a disaster recovery or continuity of operations plan for each IT system is one of the federal IT security requirements.
That said, it’s nearly impossible to plan for the scope and nature of pandemic until it happens. And although periodic testing is a requirement, D’Souza said that’s difficult to accomplish at 100% capacity. But he said several agencies performed stress tests before moving to full telework.
“You know, the tradeoff there is you don’t want to necessarily be paying for a capacity that you’re not using except, you know, once in a long while,” he said. “So it’s always a tradeoff as far as how much you want to pay for to keep that capacity in reserve.”
The other problem is that certain pieces of equipment for disaster situations are needed by everyone – web cams being an example this time around. D’Souza said he was lucky to have bought one coincidentally just before the pandemic, but for the rest of his agency they were few and far between. And having contracts with vendors to be on call in times of disaster is ideal, but when that situation occurs your agency will be one of dozens of customers asking for the same assistance simultaneously.
“So part of what you’ll have to do as an agency is identify, kind of, what are your primary essential functions and then what are some things that could wait or be done later or done by alternative means?” he said.