First Advisor

Banafsheh Rekabdar

Term of Graduation

Summer 2025

Date of Publication

10-3-2025

Document Type

Thesis

Degree Name

Master of Science (M.S.) in Computer Science

Department

Computer Science

Language

English

Subjects

Atari, Interpretability, Neural Networks, Neural Pathways, RAM, Reinforcement Learning

Physical Description

1 online resource (ix, 145 pages)

Abstract

While Deep Neural Networks (DNNs) have driven major breakthroughs in artificial intelligence, their internal complexity often makes their behavior hard to explain, resulting in the well-known “black box” dilemma. This thesis addresses the challenge of interpretability in DNNs and deep reinforcement learning (DRL) through two main contributions.

In Part I, we revisit and extend the use of Deep RAM Networks (DRNs) within the Arcade Learning Environment (ALE), showing that, with modern architectures and careful hyperparameter tuning, RAM-based agents can achieve performance competitive with established pixel-based baselines on Atari 2600 games, while offering additional advantages for research and analysis. We also train and evaluate a hybrid agent, that integrates both RAM and pixel observations, demonstrating that in most games it outperforms agents relying on either modality alone. By leveraging the compact and Markovian nature of RAM observations, DRNs not only act as competitive agents, but also enable new forms of analysis, making them particularly well-suited for interpretability studies.

In Part II, we dig deeper into agent behavior and DNN internals by introducing two general analysis and interpretability techniques. Trajectory Tracking provides a model-agnostic framework for examining long-term behavioral patterns, applicable to any reinforcement learning agent by querying basic trajectory attributes or attributes augmented by the user, within and across episodes. Neural Pathway Decomposition (NP-Decomp) offers a systematic approach for decomposing compact fully connected DNNs into their constituent neural pathways, tracing from input features and biases to final outputs. This method yields exact, context-specific attributions of how the collective of neural pathways influence an agent's decisions. While computationally intensive, this method reveals structural insights not likely to be obtained through other analysis methods.

Applying these methods to DRN agents trained on games such as Breakout, we uncover insights invisible to score-based evaluation alone, such as the agent's lapses in learned behaviors, the behavioral consequences of epsilon-greedy exploration, and the influence of particular RAM addresses on action selection. Trajectory Tracking and NP-Decomp together enable a step from assessing merely what an agent achieves score-wise, towards understanding how and why it behaves as it does.

By combining compact RAM-based architectures with deep analytical tools, this thesis lays the foundation for more transparent and interpretable reinforcement learning systems. Trajectory Tracking enables us to survey and explore the landscape of agent behavior over time, while NP-Decomp allows for the extraction of analytical core samples, cross-sections through the network's internal structure that reveal not just what decisions an agent makes, but how those decisions emerge from the collective influence of many neural pathways.

Rights

©2025 Andrew J. Wagner

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier

https://archives.pdx.edu/ds/psu/44224

Share

COinS