A new challenge to automate one of the most tedious jobs in government

Best listening experience is on Chrome, Firefox or Safari. Subscribe to Federal Drive’s daily audio interviews on Apple Podcasts or PodcastOne.

A lot of the data that the government produces needs to be rated; safe to distribute, controlled but unclassified, or maybe secret and classified. That’s simplifying this huge but never-ending task. Now the Defense Department has launched a challenge prize program to develop an artificial intelligence approach to automating some of this tedious task. Federal Drive with...

READ MORE

Best listening experience is on Chrome, Firefox or Safari. Subscribe to Federal Drive’s daily audio interviews on Apple Podcasts or PodcastOne.

A lot of the data that the government produces needs to be rated; safe to distribute, controlled but unclassified, or maybe secret and classified. That’s simplifying this huge but never-ending task. Now the Defense Department has launched a challenge prize program to develop an artificial intelligence approach to automating some of this tedious task. Federal Drive with Tom Temin  got the details from Doris Tung, the acquisition division manager in the Philadelphia division of the Naval Surface Warfare Center.

Interview transcript:

Tom Temin: Miss Tung, good to have you on.

Doris Tung: Yes, thank you for having me.

Tom Temin: And you are looking for a system to identify, I guess, the CUI, the controlled but unclassified data. Let’s begin with that type of data. Is that the hardest to identify or the most subtle, or why start there?

Doris Tung: Well, the controlled unclassified information, which I’m going to refer to as CUI, it’s difficult to mark because it has over 120 categories, and there are subsets of those. So for an end user to identify whether your document requires a special marking or not, can be pretty tedious. Whereas with classified documents, you’re pretty sure whether or not you’re working on a program that’s going to be, you know, secret, top secret, and the documents that are generated from that need to have their appropriate marking. So CUI has been around as a requirement for awhile. But because of its vast majority of categories, and special marking requirements, and also legacy markings with “for official use only”, and things like that, it can get complicated for a user to determine whether a document is CUI, and then “how do I mark it?”.

Tom Temin: And I imagine, there’s a great possibility for inconsistency from person to person or unit to unit or bureau to bureau, too?

Doris Tung: Oh, definitely. Think about all the documents that we generate in the federal government. We’re creating so many documents, especially electronically now, too. So you know, everyone is making their own decisions on whether it needs to be marked, and then doing it properly. Because there’s very specific requirements on what do. You need to put on the header or the footer of the document. And then if you’re doing emails, you know, how do you distribute CUI? So there’s specific requirements that an end-user from person to person may not be aware, and they’re just applying what they think is correct.

Tom Temin: And before we get into the details of the challenge you’ve launched, why is it coming through the Philadelphia division of the Naval Surface Warfare Center of all the possible places in the Navy?

Doris Tung: I’m a part of a department of Navy leadership program called “Bridging the Gap,” a development program for focusing on growing senior executive service. And so as part of this program, senior executive service from the Navy participates by providing real life problems for the team to solve so we can do some action learning. And Mr. Alonzie Scott, who is a SCS at the Office of Naval Research, he presented his problem to this program and our team,. You know, I’m coming out of Philadelphia, he presented a challenge of, you know, how do we simplify marking of controlled unclassified information using and leveraging automation and artificial intelligence and machine learning? I work in the contracts department, and I’m a contracting officer, and as part of the Naval Surface Warfare Center, we do have the authority to issue prize challenges. And that was a solution that our team came upon. You know, the team members consist of individuals across the Navy.

Tom Temin: Safe to say the output of this project could have Navy-wide implications, though, or even DoD wide.

Doris Tung: Right, right. Definitely. I mean, I think it could go beyond DoD, because we did have discussions as part of our market research with small business administration, defense technical information center. And you know, people are all struggling to figure out how do you effectively implement this where the users understand how to mark it, and maybe taking off some of that burden off of the end-user. So it could have possible implications for Navy and perhaps beyond.

Tom Temin: We’re speaking with Doris Tang. She’s acquisition division manager in the Philadelphia division of the Naval Surface Warfare Center. Tell us about the challenge, then, this is a not a grant program, but a prize challenge-type of program. And who are you reaching out to? And what are you hoping to come up with?

Doris Tung: So the prize challenge we decided to go with this method versus any traditional FAR, you know, Federal Acquisition Regulation-based contracting, because the prize challenge lets us go out to the public. So it can be companies, nonprofits, individuals, anyone can participate. There’s certain restrictions, but generally, you know, anyone who has a solution can submit their idea. So the prize challenge is to ask if anyone has a solution where they can leverage the artificial intelligence machine learning to automate the marking of the document, and we’ve broken up the challenge into two phases. In phase one, which actually just closed, is a white paper to demonstrate, you know, what is their prototype, and then they will have a down select, where we move on to phase two. And those individuals then can then actually build a prototype, and then we’ll test it with actual documentation to see if they can mark it accurately. And the winner that will be selected, you know, would have the highest accuracy rate, so we’re excited to see what solutions does industry and the public have to solving this problem?

Tom Temin: And do you have some objective sets of documents that everyone has agreed these are definitely CUI, because earlier, we talked about the variability that can come in there. And you mentioned 120 possible categories. And we’ve heard this for many years about how many layers there are. So what’s your reference type of data?

Doris Tung: So for the prize challenge, we are focusing on providing just a subset of the CUI category. So focusing on the privacy and the procurement and law enforcement. So we have documentation that we know for sure is marked correctly. And there’s a sample set, you know, with artificial intelligence and machine learning, the more documents that you can see the tool, the more the machine can learn. So they need the data. So we understand that part of this is that we need to give them a good data set for the tool to really learn. So we’ve kind of been scrubbing as part of our team, developing these documents, ensuring that it’s safe to share with the public as well, for this challenge. But we are focusing on just certain subsets and then hopefully, you know, depending on what is the outcome of this prize challenge, then, you know, expanding beyond just those certain subsets.

Tom Temin: And do you also have patently un-CUI that you throw in there to kind of pressure test the algorithm, for example, like throwing in a comic book or a novel?

Doris Tung: I mean, we definitely thought that. We do have non-CUI documents so that the tool can learn what is CUI and what is not CUI. But that’s a good idea about throwing in a comic book. That’s something we’ll have to consider.

Tom Temin: And I was just wondering if the algorithm can also spot classified by accident that could get in there. That would be a feature, I think you would want to have like a red light comes on and says, “Hey, wait a minute, this is not only not unclassified, but it ought to be classified.”

Doris Tung: Oh, that would be an excellent enhancement for the tool. Right now we’re only focusing on just can it even figure out is it CUI, non-CUI? And then, you know, if people have an ability to even address that part, we would love to see if they incorporated that classified piece, because classified is also a piece. It could be CUI and classified. So there are really a lot of variability to documents that you know, once hopefully, we can even just solve this basic problem, then we can then move on to see what kind of potential these tools could have. That would be something I think people would want.

Tom Temin: And what do you suspect are some of the techniques that this could be done by? For example, is it a simple word search and compare type of thing? Or is it more sophisticated than that. Is there context? Is there syntax? Because you’re dealing with mostly written documents; fair to say?

Doris Tung: That is fair to say that it is all written documents. So we did explore what software was existing out there. And there are tools out there now with developing CUI marking tool with keyword searches. But we found that to be problematic, because you’re going to rely on an individual trying to identify all the keywords that could potentially flag a certain category. And so we’re talking about 120 categories, and then there’s a subset. So do we have people who are able to really hone in on what keywords would flag each of those categories? So that’s why we move toward the machine learning to artificial intelligence machine learning when the machine then reads all these data sets, then it can figure out, you know, which of these words are, you know, I mean, that’s the part where we’re hoping that the participants not the prize challenge is going to tell us like how can your machine do this?

Tom Temin: And now you’re getting the white papers in, what is the next phase? And does this become something that as a technology transfer candidate or something, you would turn into a product that the Navy could buy?

Doris Tung: So the next phase after we review the white papers is the tech demo. And potentially, what we’re looking into is, you know, there’s new procurement vehicles and methods, such as the other transaction agreements out there. So we are looking into, you know, based on the success of phase two, where they do the demonstrations, we will then pursue whether it’s actually going to be something like a product that we can actually procure, or whether there just needs to be additional follow-up procurement methods to see. Because there are other Navy, Marine Corps operating system requirements that we also have to consider that right now the challenge isn’t really limiting the participants in that manner yet.

Tom Temin: Sure. So when you get this solved, maybe you can take on contract writing.

Doris Tung: Yes, I think it would really be a scenario that could really be delved into.

Tom Temin: You know, I guarantee you’d rocket to the SES if it got that one solved. Doris Tung is the acquisition division manager in the Philadelphia division of the Naval Surface Warfare Center. Thanks so much.

 

Related Stories

    Getty Images/iStockphoto/jdwfotoclassified data

    A new challenge to automate one of the most tedious jobs in government

    Read more

    DoD IT agency gets chief data office to help shift toward automated cybersecurity

    Read more
    (AP Photo/Charles Dharapak)FILE - This March 27, 2008, file photo, shows the Pentagon in Washington. (AP Photo/Charles Dharapak, File)

    Pentagon security agency looks to expand ‘continuous vetting’ beyond DoD, add more data sources

    Read more