Sponsor
Portland State University. Department of Computer Science
First Advisor
Feng Liu
Term of Graduation
Summer 2025
Date of Publication
7-22-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (Ph.D.) in Computer Science
Department
Computer Science
Language
English
Subjects
Adversarial Attacks, Computer Vision, Deep Learning, Image Quality Assessment, Machine Learning, Perceptual Similarity Metrics
Physical Description
1 online resource (xiii, 88 pages)
Abstract
Perceptual similarity metrics are used for quantitatively evaluating the similarity between two images as it would appear to human perception. These metrics aim to mimic the human visual system, providing a more accurate assessment of visual similarity. Such visual assessments are considered to be more advanced than simple pixel-wise comparisons such as ℓp norm distances. Thus, a human-like assessment of visual similarity, makes the metrics valuable for applications in image compression, restoration, and enhancement, where evaluating perceptual quality is crucial. Perceptual similarity metrics have progressively become more correlated with human judgments on perceptual similarity; however, despite recent advances, the addition of an imperceptible distortion can still compromise these metrics. This dissertation investigates how the magnitude of specific perturbations applied to visual stimuli affects the responses of low-level perceptual similarity metrics, with the goal of determining how imperceptible distortions impact the reliability of these metrics.
We begin by investigating the robustness of perceptual similarity metrics against an imperceptible geometrical distortion that can occasionally occur during image acquisition and preprocessing. Existing perceptual similarity metrics assume an image and its reference are well aligned. As a result, these metrics are often sensitive to a small alignment error that is imperceptible to the human eyes. In this dissertation, we first study the effect of small misalignment, specifically a small shift between the input and reference image, on existing metrics, and accordingly develops a shift-tolerant similarity metric. We build upon LPIPS, a widely used learned perceptual similarity metric, and explores architectural design considerations to make it robust against imperceptible misalignment. Specifically, we study a wide spectrum of neural network elements, such as anti-aliasing filtering, pooling, striding, padding, and skip connection, and discuss their roles in making a robust metric. Based on our studies, we develop a new deep neural network-based perceptual similarity metric. Our experiments show that our metric is tolerant to imperceptible shifts while being consistent with the human similarity judgment.
We further extend our investigation by systematically evaluating the robustness of these metrics to imperceptible adversarial perturbations. We call these perturbations adversarial, as they are deliberately crafted by an attacker with malicious intent. Following the two-alternative forced-choice experimental design with two distorted images and one reference image, we perturb the distorted image closer to the reference via an adversarial attack until the metric flips its judgment. We first show that all metrics in our study are susceptible to perturbations generated via common adversarial attacks such as FGSM, PGD, and the One-pixel attack. Next, we attack the widely adopted LPIPS metric using spatial-transformation-based adversarial perturbations (stAdv) in a white-box setting to craft adversarial examples that can effectively transfer to other similarity metrics in a black-box setting. We also combine the spatial attack stAdv with PGD (ℓ∞-bounded) attack to increase transferability and use these adversarial examples to benchmark the robustness of both traditional and recently developed metrics. Our benchmark provides a good starting point for discussion and further research on the robustness of metrics to imperceptible adversarial perturbations.
Continuing our line of investigation, we examine the accuracy and robustness of leveraging vision encoders from large foundation models as the backbone for perceptual similarity metrics. Deep-learning based approaches to measuring perceptual similarity, the perceptual similarity score between a distorted image and a reference image is typically computed as a distance measure between features extracted from a pretrained CNN or more recently, a Transformer network. Often, these intermediate features require further fine-tuning or processing with additional neural network layers to align the final similarity scores with human judgments. So far, most perceptual similarity metrics and quality assessment models based on foundation models have primarily relied on the final layer or the embedding for the similarity and quality score estimation. In contrast, this work explores the potential of utilizing the intermediate features of these foundation models, which have largely been unexplored so far in the design of low-level perceptual similarity metrics. We demonstrate that for the low-level perceptual similarity task the intermediate features are comparatively more effective than embeddings as backbone features. Moreover, without requiring any training, these metrics can outperform both traditional and state-of-the-art learned metrics by using distance measures between the intermediate features.
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
Persistent Identifier
https://archives.pdx.edu/ds/psu/44083
Recommended Citation
Ghildyal, Abhijay, "Alignment of Perceptual Similarity Metrics with Human Perception" (2025). Dissertations and Theses. Paper 6920.