How patent examination technology caught up to the 21st century

Artificial Intelligence

How patent examination technology caught up to the 21st century

Examiners at the Patent and Trademark Office (USPTO) typically have to look at thousands of documents to determine whether an application is valid.

Tom Temin@tteminWFED

May 16, 2024 2:07 pm

13 min read

Examiners at the Patent and Trademark Office (USPTO) typically have to look at thousands of documents to determine whether an application is valid. Thanks to the Federal Drive with Tom Temin next guest, those examiners now have artificial intelligence tools to work faster and more accurately. For his work, he’s a finalist in this year’s Service to America Medals program, and the first of the finalist interviews we will be bringing you this year. Talking with Temin is the Director of Emerging Technology and Chief AI Officer at USPTO, Jerry Ma.

Interview Transcript:

Tom Temin Well, tell us what you’ve done here. There’s a lot of paperwork. A lot of it’s online, I guess, nowadays. And so how does AI come to bear on patent examinations?

Jerry Ma Certainly, it’s helpful to start with the fact and the realization that we at the USPTO, our America’s innovation agency, our statutory functions and our constitutional mission. We help incentivize and foster innovation across all fields of technology and science, including artificial intelligence. A lot of my portfolio is about trying to figure out how we leverage the innovations of today to serve the innovators and entrepreneurs of tomorrow. And if we look at the AI community, especially sort of our post 2022 with the latest boom in generative AI, there’s a whole world of potential out there to harness and to leverage in serving our internal stakeholders, that is, our personnel and our expert examiners, as well as our outside stakeholders, that is the general public who relies on us for patent and trademark related services. There’s an opportunity to serve all of these communities through emerging, modern technology that helps them contend with the increasing complexity of each of their roles within the overall intellectual property ecosystem, and help them work more efficiently or work in a higher quality manner, empowered with more information and context, and overall contribute to a sounder IP ecosystem. So a lot of the individual tools that we develop at the USPTO are directed to furthering one of those aims within the context of a specific use case and user community. Whether it’s our patent examiners who rely on world class services to trawl through, without exaggeration, tens if not hundreds of millions of documents spread across multiple databases.

        Join us Jan. 27 for our Industry Exchange Cyber 2025 event where industry leaders will share the latest cybersecurity strategies and technologies.

Tom Temin Let’s talk about that one for a moment, because at one time you could go to the library and see most of the literature in paper or something for particular innovation. Now, with hundreds of thousands of applications a year, and you say perhaps millions of documents. Tell us how you’ve maybe rev that up so that it’s even doable. It sounds like the task could be getting beyond the range of possible without some of these new tools.

Jerry Ma Indeed. So this, like many things at the USPTO, has been a gradual progression from what I’ll say is a very analog process, or had been a very analog process to what we see today and what we’re aiming for tomorrow. So thinking back to before the age of computers, there is still a need for patent examiners to search because patent examiners core function among their, sort of many other responsibilities. One of their core functions is to examine any given application against the universe of what has done before. And certainly before the age of computers, there was already sort of voluminous collection of prior art in many technical fields, and they had to figure out some way to trawl through that. So before computers, what we did, we had this intricate filing system decades before my time, but I hear stories from our veteran examiners about these shoes that they would trawl through, sort of in the recesses of our old USPTO headquarters.

Tom Temin They were shoe boxes, not shoes.

Jerry Ma You know what, for the life of me, I can’t remember why they’re called shoes, but it might have been because they resembled shoe boxes or had some other connection. Anyway, so this is just showing my relative youth and inexperience. I already am spacing on the etymology of some of this, some of this analog technology, as it were. But anyways, we’re in the analog world a couple of decades ago, we had a lot of data, but not much in the way of effective ways to trawl through that data. So a lot of our examiners time was spent just on sorting through shoes, trying to build this muscle memory of like, which document existed in which shoe. Sometimes when you took the document out, another examiner who was relying on that document then wouldn’t be able to get to it. So our first phase of modernization and innovation, as it were, was actually well before my time. We digitized these archives and went from this sort of shoe based manual searching system to a computerized search, and that’s sort of already a huge sea change and our examiners are able to do their jobs and how easy we’re making it for them to access everything they need in order to perform their duties effectively. So that’s sort of stage one of innovation.

Jerry Ma However, stage one still left a lot of things to be desired, because you think about the state of the art in information retrieval back when we made this first transition, it was by and large, all sort of keyword based. And if you think about sort of the revolutions in information retrieval back a few decades ago when things like Google were coming out, those were fundamentally keyword based technologies. Google’s key innovation, of course, was figuring out how to ranked the results that were retrieved via this keyword based retrieval. And they did it very well. And that explains why there’s such a big deal now. But keywords can only take you so far, because if it’s too difficult to operationalize a concept in your mind or a concept that you see in a patent document with one single keyword or even a collection of keywords, then you’re just not going to be able to retrieve everything that you need to make a sound determination. So if you know there are five different ways of referring to a concept, and I can only think of three of those ways in my head, then those two other ways are just not going to be accessible to me. So if there are any prior art documents that talk about the same technology or same concept, but using the two words that I forgot about, I’m out of luck as either an examiner or a public searcher. So that’s where AI comes in. Because AI, one of the things that today’s modern AI technologies are super great at, although, not perfect, we have issues and other sorts of errors still certainly. But one thing that they are much better than the technologies of last generation at is this idea of semantic document representation, sort of semantic representations of meaning. So now with AI, I can either type in a concept or even refer to a concept using other documents that sort of contain or embody that concept. I can turn that into, what we call in the AI world, embeddings, these points in super high dimensional space. Sometimes 512, 1,024 dimensions. So not your typical 3D year 4D movie. You put all these documents in these 1,024 dimensional points. And then by virtue of the way in which you train these models, documents that are similar in meaning, paragraphs, documents, words that are similar in meaning will be grouped together in the space. Documents that are less similar meeting will be grouped far away. And it’s through this way that even if no, I’m searching for the concept of a computer and this other document is referring to a laptop or a mobile processing device with AI, I’m going to be able to make those connections in a way that keywords would not have allowed me to.

Tom Temin The result is, therefore, that examiners would have access and visibility into much more than they would have by just keyword search. How do you operationalize that, such that the examiner doesn’t have to be an AI programmer, or even necessarily a prompt expert, but can use this capability in his or her daily life and it’s just they’re giving them better results?

Jerry Ma That’s a great question Tom. It goes the heart of how we develop AI products at the USPTO. So when you think about AI at the USPTO, it’s not just about the models and about these high dimensional embeddings. We’re not just thinking about the under-the-hood engine, but we’re realizing that around the engine you actually have to construct the whole end to end vehicle that someone can actually use to accomplish their job and do their tasks effectively. So we’ve invested a ton of effort in making this technology as accessible to end users as possible, and a variety of different user interfaces. So we have one tool where you can just pull up a document and with a click of a button, and I’m not exaggerating, literally a click of a button, instantly draw these connections between that document to other documents in our database that are judged to be similar. So if you’re looking at a pending patent application and being able to make that connection between, again, the application which might refer to a computer, and that other thing over there in our database, which might refer to a laptop or mobile processing device, you can draw that connection without even thinking about how to do inference on that AI model. Because we have basically created the user interface, create the scaffolding above this based, very powerful technology such that users are able to make use of it in a way that really doesn’t go far beyond the user interfaces and the modes of interaction that they’re already accustomed to and capable of.

Tom Temin So basically, you’ve abstracted all of this complexity of the AI deployment and design underneath the interface for the examiners.

        Read more: Artificial Intelligence

Jerry Ma Indeed, there’s not going to be a single examiner who needs to run a script or program in order to make use of these AI capabilities. That’s not to say there are no transition hurdles, because there certainly are. It’s still going to be a bit of a paradigm shift to go from thinking about documents in terms of purely keywords to this messier concept of semantic meaning, but it’s certainly something that we’re trying to make as smooth as possible and really opening people’s eyes. The fact that semantic meaning, semantic presentations are going to be the technologies of tomorrow and gradually are going to be how we think about as thorny problem of information retrieval, whether in the IP space or anywhere else.

Tom Temin How do AI models deal with or, say, metaphors or representations? For example, you could describe a process by drawing a diagram, but that diagram doesn’t exist because what you’re describing is at the molecular structure. Yet the diagram looks like something that might be part of a mechanical system that’s very manifest. And so you could fool the model to thinking it’s looking for a mechanical system of valves and rods. When what was really described was something chemical and just the metaphor was there to describe it so someone could understand it visually. Does that make sense?

Jerry Ma It does. And some of what you referred to, Tom gets at this idea of sort of multi-modal modeling and sort of content understanding when you have these not only competing sort of substance of documents, but actually competing forms of content within documents, as you say, some could be diagrams and could be chemical molecules, and other parts of the document are just plain English. How do you reason about each and how do you integrate your reasoning about each of those different forms of documents effectively? There’s been a lot of work in the multimodal space. In fact, many of the leading commercial large language models are actually multimodal language models. So they process visual inputs as well as text inputs and sort of equal measure. What we have to think about, though, is we’re not just operating in the realm of images and text. We’re operating where things like, what you refer to diagrams also these textual representations of chemical molecules and the sort of more Electronic Arts. You might have pseudocode that represents a given procedure or algorithm for practicing a computer based invention and this or that way. So we have to think about a lot more modes of content than perhaps is typical, even when you’re working in the multimodal space, as is usual in the AI world. There’s not really a silver bullet, there’s not one model to rule them all, at least with current technology in terms of bringing all these pieces together. So we have a distinct and discrete need to understand a given form of content. We typically will build a tool that’s custom tailored to address that form of content. We might integrate it into other workflows that address other forms of content. But our by and large belief is that if you need to solve a problem, build a tool that’s directed to that task. Don’t go around with this purported super AI model and just expect it to be able to solve everything for you. That’s not how these thoughts typically bear out in reality. And that’s not how we wish to develop AI at USPTO.

Tom Temin Yes, there’s five different types of screw heads nowadays available, and you have to pick the right one for the right application.

Jerry Ma There are many more than five different state of the art AI models these days.

Tom Temin None of this will ever replace the need for the examiner’s judgment, though, will it?

Jerry Ma No. And for a couple of different reasons. The first one, and perhaps the easiest to explain, is that right now and in the indefinite future, the technology just not there. The technology’s not there to make every subtle judgment distinction, nuance that our examiners are accustomed to, trained to both before they get to the PTO. Each of them have invested in at least four and in many cases, more years of, pretty rigorous STEM training. But when they get the PTO, we then invest another many months and bringing them up to speed with the legal expertise to make these examination determinations. So AI is not going to replace the judgment, the sort of nuance, the subtle distinctions that we rely on our examiners to make on a daily basis. What it can do is provide examiners with context, with context where that information sort of clear out some of the busy work on their plate, so that their day to day work and what they actually spend their time as they expert professional on is driven at the heart of what matters in any given application. We don’t want them sort of spending two hours a day filling out administrative forms. We want them spending as much time as humanly possible, devoting their expertise, discretion and judgment to the things that will benefit most from that expertise.

        Sign up for our daily newsletter so you never miss a beat on all things federal