Less than a year from now, households across the country will receive an invitation to respond to the 2020 census online. Because the Census Bureau expects more than half of respondents will answer the decennial count through its website, it’s going to great lengths to ensure it can withstand a major volume of traffic.
Michael Thieme, the assistant director for decennial census programs, systems and contracts, speaking Thursday at a Census Scientific Advisory Committee meeting, said his team’s “biggest focus” right now is on performance and scalability testing for internet response.
“We need to know in advance how big our problem is — how many people will respond to the census, how many people in a single second are going to be hitting our internet response application, how many enumerators out in the field are going to be hitting our servers from an app from an iPhone,” Thieme said.
Based on models the bureau has worked on for the past six years, Thieme said he expects up to 120,000 concurrent users will respond to the census online at any time, but added that the bureau has built its IT architecture to handle up to 600,000 concurrent users, even though it remains unclear if the site will ever encounter that volume of traffic.
“It’s kind of like the Brooklyn Bridge, you build it five times stronger than you need it to be, and it’ll last a hundred or 200 years,” he said.
The bureau has already completed phase one of its scalability and performance testing, which Thieme described as a “paper exercise” where IT architects and engineers lay the groundwork to increase its IT capacity and take inventory of what’s already been built.
Work remains in-progress for phase two, which consists of a “unit test” aimed at figuring out how many people can concurrently access the internet self-response application using only one “cluster” of servers.
“That gives us a figure that we can then scale over and over again to hit where our actual workload should be,” Thieme said.
Progress is also underway on phase three, which tests operations “end-to-end,” and to determine whether interactions between systems affect scalability. Thieme said “very little” has been completed in phase four, where the team tries to “break” its systems with a maximum workload.
As the bureau finalizes its performance and scalability testing, it will also test its cloud deployment architecture.
“If we find out that something we built doesn’t scale to the way we wanted it to scale, we might have to change how our cloud structure is set up, we might have to add more,” Thieme said. “We have to essentially employ our scalability techniques in the cloud, and that’s one of the big reasons why cloud is so attractive for something like the census.”
Census Chief Information Officer Kevin Smith said the bureau has tested all the components within the cloud, and is “satisfied with the numbers we have” from those tests.
“The final test we have is looking at when people from your home computer go to the internet self-response site, what is the rate of latency, potentially, that may be an issue,” Smith said. “We’re talking about milliseconds here and there, but we’re going to test it to see so we know exactly what the user response is.”
Based on those end-user tests, Smith said there “may be some small tweaks” to make, but added that the bureau remains “confident” in the scalability of the core system.
The decennial count remains a top priority for its agency partners as well. Thieme said officials from the Department of Homeland Security, during a recent closed-door briefing, told congressional staff that the 2020 census will receive as much support from DHS as the 2020 election.
Smith added that the bureau is “absolutely aware” of cybersecurity threats.
“We are right now working through the coordination of how to best take the federal intelligence community’s current processes and procedures, which we’re providing data to, for them to look for foreign threats, look for foreign adversaries within social media or within direct threats to Census technology,” Smith said.
Ali Ahmad, the associate director for communications, said the bureau, working with companies including Facebook, Google, Microsoft, is working on a “rapid response” capability to respond to disinformation campaigns percolating on social media and the dark web.
“One of the main battlefronts we have is the search space. By the time we’re motivating people to respond to the census, the top thing that should come up when they Google ‘census’ should be a safe place to get information about the census,” Ahmad said.
For all the bureau’s work on cybersecurity and rooting out disinformation, Jay Breidt, a statistics professor at the Colorado State University and a member of the CSAC, urged the bureau, through its communications partnerships, to reassure the public that their responses to the census will remain secure.
“My sort of immediate reaction to that is making further security or confidentiality innovations won’t help in any way. The idea is to try and convince people that these security and confidentiality procedures already in place make any difference,” Breidt said. “That’s a hard problem to do.”
According to the Census Barriers, Attitudes and Motivators Study (CBAMS), 28 percent of respondents said they were “extremely concerned” or “very concerned” about data privacy, while 24 percent expressed similar concerns that the Census Bureau would share response data with other federal agencies, which remains prohibited under federal law.