First Advisor

Christof Teuscher

Term of Graduation

Summer 2023

Date of Publication


Document Type


Degree Name

Master of Science (M.S.) in Computer Science


Computer Science




Centralized Training Decentralized Execution, Multi-agent reinforcement learning, multi-agent search, multi-agent systems, radiation localization, target localization



Physical Description

1 online resource (xvi, 88 pages)


For the safety of both equipment and human life, it is important to identify the location of orphaned radioactive material as quickly and accurately as possible. There are many factors that make radiation localization a challenging task, such as low gamma radiation signal strength and the need to search in unknown environments without prior information. The inverse-square relationship between the intensity of radiation and the source location, the probabilistic nature of nuclear decay and gamma ray detection, and the pervasive presence of naturally occurring environmental radiation complicates localization tasks. The presence of obstructions in complex environments can further attenuate the signal from radioactive material. Current existing localization methods such as those seen by Anderson et al. (2022) that use data fusion, stationary node placement optimization, or path planning solutions rely on pre-existing knowledge of the environment, reducing flexibility. Generalizable localization methods that can localize low-signal targets are needed. In order to create these methods, the research community has shown an increasing interest in self-teaching autonomous solutions, primarily focused on the application of single-searcher architectures that use a branch of machine learning called reinforcement learning, and more recently, multi-agent reinforcement learning. Current single-agent reinforcement learning solutions for localization tasks include a double deep Q-learning approach seen by Liu et al. (2019), an actor-critic method augmented with a particle filter embedded in a neural network by Proctor et al. (2021), and a deep Q-learning method that is also augmented by a particle filter by Zhao et al. (2022). The inclusion of multiple learning agents, however, adds additional complexity and challenges, and multi-agent frameworks cannot be effectively directly extended from systems designed to be single-agent algorithms for localization tasks due to "scalability", "environment non-stationarity", and "credit assignment" challenges. Current multi-agent approaches include architectures by Alagha et al. (2022, 2023) that use centralized training decentralized execution actor-critic methods and unique learning paradigms, such as demonstration cloning. Current multi-agent methods, while effective, fail to address complex environments where signal-blocking obstructions and signal noise are present, and require a significant increase in training time when compared to single-agent architectures.

In this work, I present RAD-TEAM, a multi-agent deep reinforcement learning approach to radiation localization. RAD-TEAM is an on-policy model-free policy gradient deep reinforcement learning framework that supports an arbitrary number of agents that autonomously coordinate their efforts in order to find a source of nuclear radiation with an unknown location. Agents use proximity sensors to detect obstructions in their immediate vicinity, simulating operation in heavy smoke or turbid environments, and must learn teamwork on an individual level, training their own decentralized control policy. RAD-TEAM is compatible with two different approaches to mitigating scalability, credit assignment, and non-stationarity challenges present in multi-agent reinforcement learning settings. The first is a common control strategy called centralized training decentralized execution (CTDE), where agents restore the perception of a stationary environment through the use of a global critic and a team reward. The second is a unique multi-agent localization approach called broadcasted training broadcasted execution (BTBE) where agents use a local critic and an individual reward structure but broadcast their findings to other agents in order to restore the perception of a stationary training environment. Agents generalize their decision making through the use of convolutional neural networks and stabilize training with proximal policy optimization, making them stable and adaptable to environment dynamics that are not previously known.

In this thesis, the effects that different team sizes and multi-agent control strategies have on performance are investigated. The CTDE and BTBE strategies are applied in identical simulation environments in order to directly compare the speed and accuracy in which the teams localize orphaned radiation sources. To evaluate the CTDE and BTBE methods, RAD-TEAM teams with variable team sizes are evaluated in randomized 15 m x 15 m obstructed and unobstructed simulated environments. RAD-TEAM performance is compared with RAD-A2C, an existing single-agent architecture by Proctor et al. (2021) in a decentralized training decentralized execution (DTDE) control mode. Current testing indicates that in unobstructed environments, 4 agent BTBE teams outperform other control strategies for all tests, suggesting that broadcasting location and radiation intensity information effectively mitigates non-stationarity issues, without the need of the global critic seen in CTDE. In small environments where obstacles are present, teams of four RAD-TEAM agents who use the BTBE control strategy successfully localize radiation sources with a 60% median success rate, while 1 agent, 2 agent, and all CTDE RAD-TEAM agents fail to locate the source with statistically significant reliability entirely. In unobstructed environments, RAD-TEAM matches RAD-A2C for accuracy, however results indicate RAD-TEAM agents may perform better for the speed metric, improving localization speeds by 5-8 timesteps. These results suggest that multi-agent teams benefit from coordination and that multi-agent BTBE RAD-TEAM methods may offer improved solutions for radiation localization, helping to reduce the extreme risk that orphaned radioactive materials pose to human safety.


In Copyright. URI: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier