Deep Hierarchical Variational Autoencoders for World Models in Reinforcement Learning

Published In

2023 Fifth International Conference on Transdisciplinary AI (transai)

Document Type


Publication Date



With the increasing demand for sample-efficient and robust reinforcement learning agents, particularly in intricate domains like robotics, healthcare, and gaming, there is a strong need to minimize the computational overhead caused by the interactions between real and virtual agents. This necessitates highly accurate models to simulate virtual agents and limit the number of such interactions. To this effect, model-based reinforcement learning (MBRL) has been proven very effective in formulating an environment with superior decision-making and higher learning efficiency. A known approach in MBRL is World Models, which uses a generative engine called Variational Autoencoders (VAE). VAE utilizes a relatively simple architecture constrained in processing power for complex image inputs. Therefore, the image reconstruction error is high. Recent research in VAEs has shown poor reconstruction quality. This paper proposes a Nouveau VAE (NVAE) based World Models to address the abovementioned limitations. NVAE, which employs deep convolutions in its architecture, is employed as the visual sensory component of the World Models and is used to encode the environment dynamics into a latent representation. We show that NVAE-based World Models perform exceptionally well in the dream environment of car racing-v2 (OpenAI GYM env), improving the agent's performance by 45%. We then demonstrate that the NVAE-based World Models can be applied to robotic simulation environments like panda-gym, where the agent achieved a 95 % success rate in solving the reach task.



Persistent Identifier