For an audience of more than 275 million people in 100-plus countries, speaking 47 different languages, Voice of America has serious need of transcription and translation services. In response, the broadcaster built IPSUM, an artificial intelligence tool to do just that for nearly 1,800 hours of radio and television each week to reach.
Jim Tunnessen, VOA chief information officer and chief digital officer, said they found that machine learning could help aide in this process, but dialects, speech patterns and language behaviors are more random than consistent. But with machine learning, the more the technology is used the better it becomes, he said.
“The motivation behind this was time to be honest,” Tunnessen said on Federal Monthly Insights — Intelligent Automation. “We’re such a large service and we have 47 different languages, we have transcribers and translators in the services that are doing this on a daily basis. And we had some tech-savvy journalists themselves who would reach out and in order to expedite the process, they would subscribe to engines … outside of the building and to subscription services, and run transcription translations through there.”
To build IPSUM, Tunnessen told Federal News Network Executive Editor Jason Miller that upon reviewing those third-party transcription services, they tested the top five products against native language speakers.
“And then we looked for how we would tie this into the organization. And we need a user-friendly front end to drive these engines and to provide it for video and audio transcription and also translation,” Tunnessen said on Federal Drive with Tom Temin. “And so we built this, we mocked up an MVP, or minimum viable product, which we developed within about a three month timeframe, and tied the engines to the back end.”
VOA started with Russian, Persian, Spanish, English and Mandarin services, tested IPSUM, and added languages throughout the building process. The program is now up to more than 20 transcription languages and more than 40 translation languages. Tunnessen said that is due to the effectiveness of the AI on the back end. VOA came up with the idea of aggregating everything together but built the front end, with commercial engines on the backend. Using just one commercial engine would not have worked, due to the complexity of all of the different languages needed for translation and transcription, Tunnessen said.
“Within the system itself you can adjust and change the correctness of the vocabulary, of the script as … it’s translated through its translation or its transcription. We made it a side by side editable field so that when you did — for instance — a translation, you would see Russian on one side, you would see English on the other side and that way you could easily read and make sure that the context was correct. And then you edit in that fashion and submit it back into the engines,” he said.
VOA also feeds IPSUM’s engines with common terms. But the broadcaster still needs quality control and a human involved for the AI to work effectively. He said that while some feared their job would be replaced by AI, in fact this new tool could make their workflow easier.
“They were much happier with the release of the system,” he said. “With that, we saw major time savings as well as financial savings.”