How to speed-read a billion (classified) pages

In a book playwright  Sherman Yellen wrote for an old Broadway musical, he has a character point out, the crowns (of pre World War I Europe) all rest on credit. Not much has changed in the fiscal affairs of nations.

You might also say that sovereigns also rely on secrets. Over the years, I’ve interviewed people from the Information Security Oversight Office, part of the Archives. They are the beleaguered souls who, somehow, pore over millions of documents each year. The pages number in the tens of millions. Some portion of those pages they deem worthy of declassification.

ISOO Director Mark Bradley tells our Jason Miller the process and the systems supporting it are antiquated. He calls the whole classification-declassification process unsustainable.

I’ve always wondered, how in the heck can any group of people review 83,765,475 pages in a year, and decide 46,041,434 can be declassified? In the most recent 3-year period, ISOO reviewed 273,131,036 pages. It declassified 126,764,623 of them. Too bad, the quota was 126,764,624. Just kidding.

Advertisement

Bradley said the process should be turned over to robotics process automation, cloud computing and machine learning.

The robotics part ought to be relatively easy. Ever watch a commercial scanner at work on, say, checks or forms? Such a machine doesn’t doze off, and it doesn’t take coffee or cigarette breaks.

But what about the judgment on which should be declassified?

One potential, if cynical, answer is, why not just declassify everything at a certain date? In an age when every cyber secret the National Security Agency ever developed was dumped onto the dark web, what could possibly matter in the 146,366,413 pages ISOO returned to their moldering secrecy?

A serious answer, though, is really a question. Can some sort of mechanically-fed, algorithm-driven, machine-reading apparatus automate this work? “Pages” conjured up an image of yellowing, typewritten sheet of paper from the 1940s. In reality, I’m guessing the material is a blend of those and pictures, emails, diagrams, PowerPoints, and a zillion other types of “pages.”

ISOO presumably takes its work seriously, so it can’t just rubber stamp stacks of documents one way or the other. The work doesn’t sound like a speed-reading exercise.

In his letter to the president, Bradley points out other ways to make the classification system more tenable. One is to classify less information. Another is for agencies to make greater use of the “CUI” designation — controlled, unclassified information. Bradley asks the president to intervene and press the issue.

Bradley points out the classification system costs more than $18 billion a year to operate. It seem to me the key is training an AI system coupled to machine reading. Training it would require feeding it the same documents expert people are reading. When the people and the machine reach, say, 85 percent matching decisions, then turn over the task to the machine. Maybe the machine could be trained to flag documents it can’t decide. People would handle only the subtle cases.

Speed-reading by people won’t do. Woody Allen may have read War and Peace in 20 minutes and concluded “it’s about Russia.” But in the world of classification there’s got to be a better way.