State Department declassifies diplomatic cables using AI assistant

The State Department tool cost about $400,000 to develop and is at least 97% as accurate as humans, according to an agency leader.

A directive to U.S. embassies in India and Pakistan requesting an urgent evaluation of economic and financial vulnerabilities in those countries.

A report from the embassy in Sofia detailing discord in the Bulgarian Socialist Party.

And an internal summary, prepared by the U.S. embassy in Pretoria, of Secretary of State Madeleine Albright’s meeting with President Nelson Mandela in South Africa.

Those messages are among dozens of newly released diplomatic cables from late 1997. The State Department declassified the cables using a machine learning tool developed by the agency over the past year. The cables were not subject to any Freedom of Information Act requests, but State officials determined copies of the documents could be publicly released through the “proactive disclosure” provision of FOIA.

Eric Stein, the deputy assistant secretary for the Office of Global Information Services, called it “the first ever proactive disclosure of previously classified records . . . using machine learning and AI.”

The State Department trained the machine learning model on years of declassification decisions. The model is now 97% as accurate as humans in determining whether a record should be declassified, Stein said during an Oct. 5 event hosted by the Digital Government Institute.

“And some of those 3% issues weren’t even review decisions,” he added. “They were actually data quality issues or other challenges.”

Stein said the declassification pilot also relies on human review throughout the process to ensure the system ultimately makes the right decision.

It cost about $400,000 to develop and train the machine learning tool, which Stein described as a major cost-savings compared to hiring more individuals to manually review records. The State Department has now “fully operationalized” the tool as part of its 25-year declassification program, with the 1997 records serving as the first public example.

For years, officials who oversee classification policies have highlighted the need for agencies to use technology, like automation and machine learning, to keep up with an ever expanding deluge of electronic records.

The former head of the Information Security Oversight Office at the National Archives called it a “tsunami of digitally created classified records” and said in his final report to the president that “it will be a mammoth task to turn these tidal waves.”

In a landmark 2020 report, the Public Interest Declassification Board (PIDB) also recommended agencies use machine learning and artificial intelligence to automate both the classification and declassification processes.

And more recently, a major push in Congress to overhaul the government’s classification system also features a goal to use technology to support declassification reviews.

Meanwhile, agency FOIA programs have likewise struggled with limited resources to keep up with a record-high number of requests and review an expanding set of electronic records.

The State Department’s pilot may serve as a bellwether for broader efforts aimed at streamlining declassification and improving FOIA. State is also exploring the use of AI to help streamline the FOIA process.

“These are proactive steps that we’re taking with technology to increase transparency at our agency, and work that can be done in other agencies as well, if they have the technology and ability to do so,” said Stein, who is also co-chairman of the Chief FOIA Officers Council’s Technology Committee.

One of the important lessons so far, he said, is having “good quality records and data.” And he also recommended agencies start with a small subset of records, such as the State Department’s use of diplomatic cables.

The State Department is now considering expanding the machine learning declassification tool to email and other record types. But Stein acknowledged the application of the machine learning varies across different data types.

“It comes down to focusing on a specific set of data, records that have data standards that the technology can use to sort and identify,” he said. “And it gets more challenging as we look at different record types, whether it be PDFs, photos, JPEGs, videos, and we start looking at all of these different types of records that are out there. The technology starts to struggle a little bit.”

Copyright © 2024 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.

Related Stories