First Advisor

Christof Teuscher

Term of Graduation

Summer 2021

Date of Publication


Document Type


Degree Name

Master of Science (M.S.) in Electrical and Computer Engineering


Electrical and Computer Engineering





Physical Description

1 online resource (xvi, 86 pages)


Rapid localization and search for lost nuclear sources in a given area of interest is an important task for the safety of society and the reduction of human harm. Detection, localization and identification are based upon the measured gamma radiation spectrum from a radiation detector. The nonlinear relationship of electromagnetic wave propagation paired with the probabilistic nature of gamma ray emission and background radiation from the environment leads to ambiguity in the estimation of a source's location. In the case of a single mobile detector, there are numerous challenges to overcome such as weak source activity, multiple sources, or the presence of obstructions, i.e. a non-convex environment. Detectors deployed to smaller autonomous systems such as drones or robots have smaller surface area and volume resulting in worse counting statistics per dwell time. Additionally, search algorithms need to be efficient and generalizable to operate across a variety of scenarios.

The motivation of this work is to investigate the sequential decision making capability of deep reinforcement learning (DRL) in the nuclear source search context. We focus on a branch of DRL known as stochastic, model-free, on-policy gradients that learns strictly through interaction with an environment to develop a useful policy for a specified goal. A novel neural network architecture (RAD-A2C) based on the actor critic (A2C) framework that uses a gated recurrent unit (GRU) for action selection and a particle filter gated recurrent unit (PFGRU) for localization is proposed.

Performance is studied in randomized 22 x 22 m convex and non-convex simulated environments across a range of signal-to-noise ratio (SNR)s for a single detector and single source. The RAD-A2C performance is compared to both an information-driven controller that uses a bootstrap particle filter (BPF) and to a gradient search (GS) algorithm. We find that the RAD-A2C has comparable performance to the information- driven controller across SNR in a convex environment and at lower computational complexity per action. The RAD-A2C far outperforms the GS algorithm in the non-convex environment with greater than 95% median completion rate.


In Copyright. URI: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier