Presentation Type
Poster
Location
Portland State University
Start Date
5-7-2019 11:00 AM
End Date
5-7-2019 1:00 PM
Subjects
Machine learning, Neural networks (Computer science), Evolutionary programming (Computer science), Genetic algorithms
Abstract
In machine learning research, adversarial examples are normal inputs to a classifier that have been specifically perturbed to cause the model to misclassify the input. These perturbations rarely affect the human readability of an input, even though the model’s output is drastically different. Recent work has demonstrated that image-classifying deep neural networks (DNNs) can be reliably fooled with the modification of a single pixel in the input image, without knowledge of a DNN’s internal parameters. This “one-pixel attack” utilizes an iterative evolutionary optimizer known as differential evolution (DE) to find the most effective pixel to perturb, via the evaluation of numerous candidate solutions with a specific fitness function. We first improve upon the original implementation of the attack by designing a fitness function to minimize the magnitude of the perturbation in addition to the network confidence. The original attack achieves a success rate of 37% on our basic model with a mean attack RMSE of 0.02418; the improved attack achieves a success rate of 38% with a mean attack RMSE of 0.01946. We then explore the attack’s efficacy by comparing its performance in neural networks of different depths, and analyze the technique by computing per-pixel heatmaps of vulnerabilities in input images. Our findings highlight the applicability of the technique across networks, while at the same time demonstrating the shortcomings of DE in maximizing the attack potential. Future work could address these shortcomings, as well as extend the one-pixel attack to new domains (e.g. video).
Rights
© Copyright the author(s)
IN COPYRIGHT:
http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
DISCLAIMER:
The purpose of this statement is to help the public understand how this Item may be used. When there is a (non-standard) License or contract that governs re-use of the associated Item, this statement only summarizes the effects of some of its terms. It is not a License, and should not be used to license your Work. To license your own Work, use a License offered at https://creativecommons.org/
Persistent Identifier
https://archives.pdx.edu/ds/psu/28613
Included in
Exploring and Expanding the One-Pixel Attack
Portland State University
In machine learning research, adversarial examples are normal inputs to a classifier that have been specifically perturbed to cause the model to misclassify the input. These perturbations rarely affect the human readability of an input, even though the model’s output is drastically different. Recent work has demonstrated that image-classifying deep neural networks (DNNs) can be reliably fooled with the modification of a single pixel in the input image, without knowledge of a DNN’s internal parameters. This “one-pixel attack” utilizes an iterative evolutionary optimizer known as differential evolution (DE) to find the most effective pixel to perturb, via the evaluation of numerous candidate solutions with a specific fitness function. We first improve upon the original implementation of the attack by designing a fitness function to minimize the magnitude of the perturbation in addition to the network confidence. The original attack achieves a success rate of 37% on our basic model with a mean attack RMSE of 0.02418; the improved attack achieves a success rate of 38% with a mean attack RMSE of 0.01946. We then explore the attack’s efficacy by comparing its performance in neural networks of different depths, and analyze the technique by computing per-pixel heatmaps of vulnerabilities in input images. Our findings highlight the applicability of the technique across networks, while at the same time demonstrating the shortcomings of DE in maximizing the attack potential. Future work could address these shortcomings, as well as extend the one-pixel attack to new domains (e.g. video).