The Army Research Office is trying to combat against that, however.
“Helen Li, electrical and computer engineer at Duke University, is working on defending against these kinds of attacks,” MaryAnne Fields, program manager for intelligent systems at the Army Research Office told Federal News Network. “The general idea is they want to make the neural network less sensitive to triggers in general by identifying the types of triggers that could fool the classifier, and then train the program to ignore those.”
There’s a lot to unpack there. Basically, an adversary can place a set of pixels in an everyday item. When the computer looks at the image, the pixels throw the computer off. Think of it like a QR code secretly embedded in something like an earring or a hat logo.
“The backdoor attack is an attack at the data that’s being fed into the algorithm,” Fields said. “If the adversary can get to the data before we train the algorithm to start recognizing the images that’s where the backdoor can occur. You’re leaving the code alone, but you’re changing the data.”
Fields said the backdoor image may make the algorithm associate every image of a gold medal with Michael Phelps because it places too much weight on the medal and not other parts of the image.
You could see how that might be a serious problem if a computer starts categorizing every object with a tent as an enemy camp.
Li’s work is trying to weed out these backdoor attempts.
She looks at images with possible triggers, and sees if a particular set of pixels can cause a problem for machines when they are learning.
“For example, with QR codes, she’s asking the question, ‘Can a particular QR code cause a problem?’ If it can that becomes part of the distribution,” Fields said.
Li can then train the computer to weed out images with malicious pixels in them.
“To identify a backdoor trigger, you must essentially find out three unknown variables: which class the trigger was injected into, where the attacker placed the trigger and what the trigger looks like,” Qiao Ximing, one of the members of the team at Duke said in an Army Research Office release.
“Our software scans all the classes and flags those that show strong responses, indicating the high possibility that these classes have been hacked,” Li said. “Then the software finds the region where the hackers laid the trigger.”
The next step is to figure out where the trigger is in the image.
“Because the tool can recover the likely pattern of the trigger, including shape and color, the team could compare the information on the recovered shape — for example, two connected ovals in front of eyes, when compared with the original image, where a pair of sunglasses is revealed as the trigger,” the Army Research Lab release states.
The research is being conducted as part of a short-term innovative research grant, which awards up to $60,000 for a nine-month effort.