First Advisor

Banafsheh Rekabdar

Term of Graduation

Fall 2025

Date of Publication

12-9-2025

Document Type

Thesis

Degree Name

Master of Science (M.S.) in Computer Science

Department

Computer Science

Language

English

Subjects

atari, mamba, mujoco, online fine-tuning, reinforcement learning, sequence modeling

Physical Description

1 online resource (v, 40 pages)

Abstract

Online in-context reinforcement learning enhances offline-trained policies through online fine-tuning. We introduce Online Decision Mamba (ODM), an architecture that replaces the attention mechanism in Online Decision Transformers (ODT) with the Mamba module to improve long-context sequence modeling and overall RL performance. We performed in-depth evaluations on MuJoCo (OpenAI Gym) and Atari benchmarks, comparing ODM against state-of-the-art offline and online baselines—including Decision Mamba (DM) and ODT. Our results show that ODM achieves competitive or superior performance, with particularly robust gains when initial datasets lack expert demonstrations. In the Qbert Atari environment, ODM shows context-length sensitivity similar to offline DM; however, we demonstrate that adjusting the Mamba delta-parameter initialization range effectively mitigates any performance degradation. Further experiments explored the effects of frame stacking, action-embedding dimensionality, exploration strategies, multinomial sampling temperature, pretraining iterations, and replay-buffer size. These findings confirm that ODM is a flexible, high-performance framework for online in-context reinforcement learning, adaptable to diverse tasks and dataset characteristics.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier

https://archives.pdx.edu/ds/psu/44398

Share

COinS