As agencies transition to strictly digital records, there’s plenty of paperwork to go around

Many agencies are turning to new mechanisms for processing those paper documents into usable data.

As agencies are in the middle of transitioning to strictly digital records, there is still plenty of paperwork to go around. Many of them are turning to new mechanisms for processing those paper documents into usable data. Our next guest is in the business of helping them do that, we welcome to the Federal Drive with Tom Temin, Brian Weiss, Chief Technology Officer for the AI software provider Hyperscience.

Interview transcript: 

Tom Temin  Agencies have mostly transitioned to mostly digital records, yet they still have plenty of paperwork. Some have turned to technology for converting paper into machine readable data. For more, Federal News Network’s Eric White spoke with Hyperscience chief technology officer, Brian Weiss.

Brian Weiss  We’re in a really interesting inflection point right now because all of the legacy technologies that are have done what technologies always try and do, which is understand noisy human information, right? It’s the interface between machines and people, and no more evidence about the difficulty of that problem, which is, like, try and make a machine understand handwriting. But our brains are very good at visually understanding, like, quickly, what we’re looking at and what it means. But getting a machine to do that is actually very, very difficult. Now, historically, the way companies have gone about, technology has gone about this, is what we, just a rules based approach. So, every time you see something, I try and write a rule for what I’m going to see, and if it matches my rule, then I can transform that messy information to something digital and where we are now, and what Hyperscience really is driving, is a revolution in using what we call AI these days, but really it’s machine deep learning to train models to understand, similar to what people do, over time, because you’re showing them repetition of the variety they learn to understand the variance. So, you know, if you look at a box of paper, you think it’s very simple to just scan that stuff and understand it, but think about how hard it is to understand handwriting. Okay, a machine to understand, you would have to say, here’s every single variant of the way. So, the only way to deal with the complexity is using an underlying model that learns over time. You show it lots of examples, and you train it, and that’s really what Hyperscience has done, like, we’re roughly a eight-year-old company, and the first four years of that were all about building a harnessing framework to understand information the same way humans do, but to do it in a way where the data you’re using and the information you’re touching is highly secure. So, we cut our teeth, like I say, on back office automation and building models for the financial services industry. There isn’t, that data is so incredibly secure, or and then also governments. So, organizations like the VA, where we process, actually a billion pieces of documents these days, and it’s every claim going through the VA for our veterans, that data has to live inside the environment. Like, you can’t hand it off to a GPT model or a third party. So, what Hyperscience has done is kind of crack that code for how to enable very high performing recognition of information like people, and we get, we get human level accuracy out of the system these ways, with Hyperscience. What that means? I mean, just to put it in pure numbers, the legacy technologies approach to this usually cap out at about 60% correctness, accuracy. So, if that’s the case, what happens to the other 40, right? And, right now, what happens is, it goes to people. Like, somebody has to sit down and pick up the form and pick up a screen and key it in of what they’re seeing, and then make it translatable to the machine. So, 40% of that workload is going to people. It is really a ripe opportunity for a technology transition. At Hyperscience, look, we’re bringing almost a 99% accuracy rate because the models are getting smarter, so the more they see, the better they get. And, so, the efficiency gains are monstrous in this and it really is a technology revolution. It’s not incremental at all. It’s apples and airplanes, it’s not apples and oranges. And, so, the benefits are really huge and Hyperscience is, you know, we’ve been, it’s not overnight. We’ve been doing this for eight plus years. And so we’ve very much sort of cracked that code internally for how to drive that efficiency.

Eric White  If you could just kind of explain to me, how does this work? I mean, you know, going back, I remember the scannable documents tools where you know, you insert a piece of paper into what you know was either a scanner or even sometimes just a bar that you send it through. What are we talking about here in terms of hardware and technology and what? What is it that exactly this is capable of doing?

Brian Weiss  Yeah, so look the back end infrastructure. We’re taking digitized documents whether their scans, or, you know, their digital native could be. And so the infrastructure, really, what it does is this, it’s a platform on which we harness models, right machine models that have been trained to do what people do. And the first part of that is, is what we call computer vision code. So, all of the noisiness and the lines on a scan, and people spill coffee on it, and they fold it in half, and all of that stuff, or it’s skewed sideways. Our models that we built for 10 plus years correct for all of that. So, we sort of take the noise out of all of that bit. Before you even ask the question of, what am I looking at? And then, the, you know, we’re essentially standing up models that answer the question, what am I looking at? Is it the form XYZ, is it 1040, or is it etc.? Or, and then, once you figure that out, what kind of data do I need to get off the page? And these are learning models. So, go find these five pieces of information. I’m not going to tell you where they are, Eric, I just need you to find them, right? And by the way, I’m not even going to, I’m not going to even remotely show you where they might be. You need, you need to go figure out, based on what you’ve seen up to date, where this data lives on a page. And then once you’ve done that, I need you to translate it into digital and that means when I scribble my name and I write outside the box, and I put the zero when I run over it and I scratch it out and all, I need you to figure that out, too. And, so, the underlying harness of that is we have training servers right in which you build the models to allow you to, you know, create this efficiency, and then a pipeline that is very efficient in harnessing the models in process. And we use CPU, not GPU, because it can be very expensive, you know. So, we’re really proud of having effectively narrow-ish models. You hear about a trillion parameters and things like that, like we’re very focused on getting the job done. And then the other piece of innovation with Hyperscience is what we call just in time human in the loop. So, unlike the prior range of technologies that did this, if an OCR gets it wrong, your only option is to hand it off to somebody to do the whole form by themselves. What Hyperscience does uniquely is if the machine is a little bit not confident, right? So I’ve got 100 field, I’m trying to get I’m trying to get data out, but I got one, but I’m not quite sure if I’m right. It’ll raise his hand and say, hey, I need a little help. So, at that point, just in time for a tiny little piece of human intervention, I can now get that entire document through. It takes me right to them, says, help me figure this one out. So, that concept of just in time human in the loop in order to help models do their work, it’s kind of like a digital worker, right?

Eric White  Well, that’s why I was going to ask, with your agency customers, you know, what is their usual desire? And you know, what is it they’re exactly more worried about? Is it workload or accuracy?

Brian Weiss  Great question. There is an inverse relationship between I care about accuracy a lot, and if I, and it’s actually a tunable setting in our platform, you can say, I care a lot about accuracy. And, by the way, if you’re going to use this data to train other models, that it’s critical. So, at that point, the digital worker will raise its hand more often because it’s not allowed to push this thing through without a very high confidence level. So you sort of set that confidence level so customers will then so and what you’re trading off then is you’ll have to touch a few more documents, right, more often, to get to get the machine to make its threshold. But at the same time, they’re striking that balance between what’s automated without its raising its hand and the accuracy. And we see accuracy rates of 99% and automation levels of 98. Now, that’s in a world where 60% used to be considered good.

Tom Temin  Brian Weiss, Chief Technology Officer at Hyperscience. Find this interview at federalnewsnetwork.com/federaldrive and subscribe to the Federal Drive wherever you get your podcasts.

Copyright © 2024 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.

Related Stories

    The State Department seal is seen on the briefing room lectern ahead of a briefing by State Department spokesperson Ned Price at the State Department in Washington, DC, on January 31, 2022. (Photo by MANDEL NGAN / POOL / AFP) (Photo by MANDEL NGAN/POOL/AFP via Getty Images)

    State Dept urges workforce to prioritize disability rights in accessibility playbook

    Read more
    U.S. Marine Corps Forces Cybersp/Staff Sgt. Jacob OsborneMarines with Marine Corps Forces Cyberspace Command pose for photos in the cybersecurity operations center at Lasswell Hall aboard Fort Meade, Maryland.

    CMMC requirements demand innovative approaches to securing CUI

    Read more