First Advisor

Christof Teuscher

Term of Graduation


Date of Publication

Summer 8-8-2019

Document Type


Degree Name

Doctor of Philosophy (Ph.D.) in Electrical and Computer Engineering


Electrical and Computer Engineering




Neural networks (Computer science), Machine learning, Image processing



Physical Description

1 online resource (xxii, 128 pages)


This dissertation concerns methods for improving the reliability and quality of explanations for decisions based on Neural Networks (NNs). NNs are increasingly part of state-of-the-art solutions for a broad range of fields, including biomedical, logistics, user-recommendation engines, defense, and self-driving vehicles. While NNs form the backbone of these solutions, they are often viewed as "black box" solutions, meaning the only output offered is a final decision, with no insight into how or why that particular decision was made. For high-stakes fields, such as biomedical, where lives are at risk, it is often more important to be able to explain a decision such that the underlying assumptions might be verified.

Prior methods of explaining NN decisions from images have been proposed, and fall into one of two categories: post-hoc analyses and attention networks. Post-hoc analyses, such as Grad-CAM, look at gradient information within the network to identify which regions of an image had the greatest effect on the final decision. Attention networks consist of structural changes to the network, which produce a mask through which the image is filtered before subsequent processing. The result is a heatmap highlighting regions which have the greatest effect on the final decision. This dissertation identifies two flaws with these approaches. First, these methods of explanation change wildly when the network is exposed to adversarial examples. When an imperceptible change to the input results in a significant change in the explanation, how reliable is the explanation? Second, these methods all produce a heatmap, which arguably does not have the definition required to truly understand which features are important. An algorithm that can draw a circle around a cat does not necessarily know that it is looking at a cat; it only recognizes the existence of a salient object.

To address these flaws, this dissertation explores Sensory Relevance Models (SRMs), methods of explanation which utilize the full richness of the sensory domain. Initially motivated by a study of sparsity, several incarnations of SRMs were evaluated for their ability to resist adversarial examples and provide a more informative explanation than a heatmap.

The first SRM formulation resulted from a study of network bisections, where NNs were split into a pre-processing step (the SRM) and a classifying step. The result of the pre-processing step would be made very sparse before being passed to the classifier. Visualizing the sparse, intermediate computation would potentially have yielded a heatmap-like explanation, with the potential for more textured explanations being formed off of the myriad features comprising each spatial location of the SRM's output. Two methods of achieving network bisection using auxiliary losses were devised, and both were successful in generating a sparse, intermediate representation which could be interpreted by a human observer. However, even a network bisection SRM which used only 26% of the input image did not result in decreased adversarial attack magnitude. Without solving the adversarial attack issue, any explanation based on the network bisection SRM would be as fragile as previously proposed methods.

That led to the theory of Adversarial Explanations (AE). Rather than trying to produce an explanation in spite of adversarial examples, it made sense to work with them. For images, adversarial examples result in full-color, high-definition output. If they could be leveraged for explanations, they would solve both of the flaws identified with previous explanation techniques. Through new mathematical techniques, such as a stochastic Lipschitz constraint, and designing new mechanisms for NNs, such as the Half-Huber Rectified Linear Unit, AE were very successful. On ILSVRC 2012, a dataset of 1,281,167 images of size 224x224 comprising 1,000 different classes, the techniques for AE resulted in NNs 2.4x more resistant to adversarial attacks than the previous state-of-the-art, while retaining the same accuracy on clean data and using a smaller network. Explanations generated using AE possessed very discernible features, with a more obvious interpretation when compared to heatmap-based explanations. As AE works with the non-linearities of NNs rather than against them, the explanations are relevant for a much larger neighborhood of inputs. Furthermore, it was demonstrated that the new adversarial examples produced by AE could be annotated and fed back into the training process, yielding further improved adversarial resistance through a Human-In-The-Loop pipeline.

Altogether, this dissertation demonstrates significant advancements in the field of machine learning, particularly for explaining the decisions of NNs. At the time of publication, AE is an unparalleled technique, producing more reliable, higher-quality explanations for image classification decisions than were previously available. The modifications presented also demonstrate ways in which adversarial attacks might be mitigated, improving the security of NNs. It is my hope that this work provides a basis for future work in the realms of both adversarial resistance and explainable NNs, making algorithms more reliable for industry fields where accountability matters, such as biomedical or autonomous vehicles.


In Copyright. URI: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier

adversarial explanations preprint.pdf (24529 kB)
Adversarial explanation preprint

Fast and Accurate Sparse Coding preprint.pdf (1088 kB)
Fast and Accurate Sparse Coding preprint