Air Force finds algorithm can help predict an officer’s future performance

Air Force

Air Force finds algorithm can help predict an officer’s future performance

Can artificial intelligence help in the management of human intelligence or specifically human talent management? The RAND Corporation set out to answer this...

Tom Temin@tteminWFED

June 8, 2021 11:56 am

11 min read

Best listening experience is on Chrome, Firefox or Safari. Subscribe to Federal Drive’s daily audio interviews on Apple Podcasts or PodcastOne.

Can artificial intelligence help in the management of human intelligence or specifically human talent management? That’s the specific question the RAND Corporation decided to look into. With what investigators found, RAND policy researcher David Schulker joined Federal Drive with Tom Temin.

Interview transcript:

Tom Temin: Mr. Schulker, good to have you on.

David Schulker: Thanks, Tom, I appreciate the invitation.

Tom Temin: And you looked at the domain of Air Force talent management, specifically looking at officers and the management of their promotions, I guess, in assignments. This was not something the Air Force asked you to do. It was just something you initiated for experimental purposes?

David Schulker: That’s correct. The Air Force has supported a wide range of research on potential applications of artificial intelligence to improve talent management. This particular work was part of RAND’s internal research and development program, where we test and prototype ideas for the Air Force so that we can bring them a more finished product that we can verify actually works before we approach them for research support.

Tom Temin: All right, and what were you specifically looking at?

David Schulker: So the question that we were really looking at with this study centers around performance evaluations. And so when you think about it, performance evaluations, if your goal for an HR system is to deliver people with the right skills and abilities in order to accomplish the organization’s mission, then the information that gets collected in a performance evaluation is of central importance, because that’s where every single person is compared against the requirements of their job, and evaluated on how well they’re performing it. And so if you were in charge of the recruiting function, for example, you would like to know whether the people that you’re recruiting or being successful when they’re placed in their jobs, and potentially, you’d like to know which people are the most successful so that you could recruit more of them in the future. And so in a large organization, this becomes really difficult, because performance evaluations are typically made up of nothing but narrative text. Sometimes they’ll have rating scales and things, but they usually have a significant narrative component. And so if you have tens of thousands or 100,000 people in your organization, if you’re in charge of recruiting now, it becomes very difficult to sort through all those narratives and try to understand whether you’re recruiting the right people. And so the topic, the overall idea of this research is, could artificial intelligence, could natural language processing help us with this problem, so that we could unlock the information that’s trapped in those performance narratives, and then all the other pieces of HR could use it in order to improve their processes? So that was the fundamental question we were investigating.

Tom Temin: Yeah so the essential question, then is this text that is written – natural, written language processing – that seems to be the nut here.

David Schulker: That’s the idea. If you only have a small number of people, then you can pick up those evaluations, and you can read them and understand them. But when you have 10,000 people, and you’re trying to sift through and understand who the top performers are, and maybe try to find ways that you can develop people in order to improve their performance in the future, it just becomes really difficult because you can no longer manually review all that text. And so that’s an area where we thought maybe artificial intelligence could help you sift through the text, help you process it, and then you could, instead of using other information, you could use that rich information from the performance reviews to help all the other HR functions.

Read more: Air Force

Tom Temin: And let me ask you this, it seems like one of the challenges then would be the fact that in a larger organization – in the case of the Air Force you have tens of thousands of people being written up perhaps by thousands of people – or maybe it’s hundreds of thousands being written up by tens of thousands – and everyone has a different style and approach to the same thing. So one person would write “Joe Schmo is a great performer, but he’s kind of a jerk.” Someone else would say, “Joe Schmo meets the job requirements in a consistent manner, but has difficulty relating to people surrounding him” – totally different uses of language to say the same thing. Is that one of the challenges?

David Schulker: In this case, when you’re talking about military writing, you actually have some advantages that you wouldn’t have in a regular organization with the way performance evaluations are written. And so in the military, typically, the language and the way that you describe performance is very tightly regulated. And there’s a system of how you write that gets passed on from generation to generation so that younger officers are taught: This is the way that you identify a top performer, and this is the way that you identify somebody who needs improvement. And all of that information is very strictly regulated by, you’re only allowed to write certain things. Certain things are widely recognized as key signals that are reserved only for the top performers. And so we have a bit of an advantage there. Because there’s a great deal of standardization in the way the writing is done, that you wouldn’t expect to have in a regular organization where managers can just describe open ended text about what somebody is doing.

Tom Temin: Got it. So it’s “Air Force natural language” and not really natural language, to put a finer point on it. Got it.

David Schulker: Sure. And you almost might not even call it natural language. If you saw what I’m talking about when I say a narrative, they’re these bulleted statements. You might imagine they’re full of acronyms. They all have to fit on one line. So oftentimes, you just delete characters and put apostrophes and things. And so if you look at one of these bullets, you wouldn’t be able to understand it. It’s very cryptic. And so in a way, it’s not quite natural language. But we can use the same types of techniques for other language that appears like that. For instance, if you’re analyzing a tweet, have the same sort of thing where you have choppy words and typos and things like that. That’s basically the same types of techniques that we use to analyze these Air Force performance evaluations.

Tom Temin: I guess for purposes of algorithms, then, it may not be natural speech, but it’s also not structured data, either.

David Schulker: Exactly. Yeah, it’s not very natural but it’s also unstructured. And so you need you need some way to deal with that.

Tom Temin: All right, we’re speaking with David Schulker. He’s a policy researcher at the RAND Corporation. And what did you find here? Is it possible to automate this in some way or apply AI to it?

Sign up for our daily newsletter so you never miss a beat on all things federal

David Schulker: We definitely found that it is. And so our goal to test this idea was to create an algorithm or a machine learning model that could score an officers record the same way a human judge, if they were to pick up that record and evaluate them for a promotion or developmental assignment, might score the record. And so what we did was we tested we developed a model that fit the data, and then we tested how it performed to see whether it was accurately picking up on the right performance signals. And if I showed you the phrases, and the key words that it picked up on, any senior officer in the Air Force would recognize those as key signals for top performers.

Tom Temin: Well give us a couple of those phrases and words.

David Schulker: Sure. So in the Air Force, for instance, one of the key parts of the performance narrative is the push statement. And that’s where you recommend an officer for their next assignment or their next job. And the higher you recommend them, the better you’re signaling about their performance. So if you recommend them for staff on a numbered Air Force or a [major command] which are higher levels of organization – or even for Joint Staff, which is the highest level of organization – that’s a very big signal. And so the word “joint” was all over the keywords that correlated highly with performance. If you – “Joint Staff” or “go to Joint Staff next,” those types of phrases came out as very key signals in what the model was picking up on. And so we recognize that straight away as the model was picking up on some of the right things that an officer would recognize, too.

Tom Temin: And what might some of the challenges be with applying this in a real world situation?

David Schulker: The challenges I think, come into when you go to implement it. Because when you’re talking about HR, the decisions that you’re making are affecting people’s lives and careers. And so you have to be extremely careful with how you use the machine, how well you regulate it, you don’t necessarily want to unleash it, because it could make a decision that the organization itself wouldn’t support. And there’s also the issue of transparency. Because if you’re going to use a machine like that to affect people, then you need to make sure they understand how it’s being used, and they understand how it’s affecting them. And so all of those things are – the research literature has recognized, were big challenges that make HR a little bit different from your garden variety business operation that you have to wrestle with.

Tom Temin: Sure. And if you have an AI system, making recommendations, and people I would think would still have to evaluate the decision that came out of the AI program, and would still have to oversee it, what then have you gained by using AI?

David Schulker: So I think you think about the advantages and disadvantages of AI versus a human intelligence. And this case, the big thing that you gain is you can now apply this type of analysis at a larger scale. So before you would reserve the review of the performance evaluations to certain high stakes situations where it’s worth it to dedicate the labor to go through all the evaluations, something like a promotion event. But now you have the ability to monitor performance basically every year, because you can score everyone’s records as this continuous data stream comes in year after year. And so that’s a key advantage. There’s also the issue of human judges can sift through the complex data, but they’re also, might miss something. They also are not unbiased in the machine, because it’s in a certain way standard, might pick up on something that humans missed. And so you can imagine a sort of teamwork between the humans and the machine where the machine says, “Hey, you might have overlooked this. I think this record is pretty good. You might want to take another look at it.” I think they can kind of help each other overcome each other’s limitations in that way.

Tom Temin: Do you think this methodology then applies perhaps to the Army and the Navy as well, if you were to train the algorithm with their evaluation language?

David Schulker: Sure. I think all the services use slightly different forms when it comes to documenting performance evaluations. But they all have a narrative text component. The Air Force is a bit unique, because their evaluations were 100% text. There’s no rating scales, it’s all text. And so it was uniquely an opportunity for the Air Force. But all of the services have a narrative block. It’s always important because that’s the chance that the evaluator really has to communicate directly to the HR system and say, “This is what you need to know about this person. This is how you interpret the rest of this form.” And so these techniques absolutely have application to all the other services as well.

Tom Temin: And what happens next with these findings?

David Schulker: Well, the great thing about this report is that it proved the concept that essentially said that the model performed at an acceptable level of accuracy, it showed that it can detect the same kinds of performance signals without us having to explicitly tell it what to look for. And that has a huge advantage because as the narratives change over time, as the Air Force’s missions change and new things become important, the model can adapt to that information. And so given the idea that that has been shown to work, the next steps, I think are to continue to think through what are the ways that we might use it, continue to develop it into something that could actually be put into practice, and then to partner with the Air Force, using RAND’s expertise to help them overcome some of those challenges that we talked about before and make sure that they can safely use this to improve their decisions without any unwanted consequences.

Tom Temin: David Schulker is a policy researcher at the RAND Corporation. Thanks so much for joining me.

David Schulker: Thanks, Tom. I really enjoyed the conversation.