Sponsor
Portland State University. Department of Computer Science
First Advisor
Banafsheh Rekabdar
Term of Graduation
Fall 2025
Date of Publication
12-9-2025
Document Type
Thesis
Degree Name
Master of Science (M.S.) in Computer Science
Department
Computer Science
Language
English
Subjects
atari, mamba, mujoco, online fine-tuning, reinforcement learning, sequence modeling
Physical Description
1 online resource (v, 40 pages)
Abstract
Online in-context reinforcement learning enhances offline-trained policies through online fine-tuning. We introduce Online Decision Mamba (ODM), an architecture that replaces the attention mechanism in Online Decision Transformers (ODT) with the Mamba module to improve long-context sequence modeling and overall RL performance. We performed in-depth evaluations on MuJoCo (OpenAI Gym) and Atari benchmarks, comparing ODM against state-of-the-art offline and online baselines—including Decision Mamba (DM) and ODT. Our results show that ODM achieves competitive or superior performance, with particularly robust gains when initial datasets lack expert demonstrations. In the Qbert Atari environment, ODM shows context-length sensitivity similar to offline DM; however, we demonstrate that adjusting the Mamba delta-parameter initialization range effectively mitigates any performance degradation. Further experiments explored the effects of frame stacking, action-embedding dimensionality, exploration strategies, multinomial sampling temperature, pretraining iterations, and replay-buffer size. These findings confirm that ODM is a flexible, high-performance framework for online in-context reinforcement learning, adaptable to diverse tasks and dataset characteristics.
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
Persistent Identifier
https://archives.pdx.edu/ds/psu/44398
Recommended Citation
Ruf, Trenton W., "Online Decision Mamba" (2025). Dissertations and Theses. Paper 6983.