First Advisor

John Lipor

Date of Publication

Fall 1-14-2020

Document Type


Degree Name

Master of Science (M.S.) in Electrical and Computer Engineering


Electrical and Computer Engineering




Active learning, Reinforcement learning, Sampling



Physical Description

1 online resource (ix, 52 pages)


A fundamental challenge to modern science and engineering is the ability to rapidly and accurately determine the spatial extent of environmental phenomena. In monitoring the spread of hazardous pollution, for example, all points with pollutant concentration above or below a fixed threshold can be considered as two classes in a binary classification problem. In this instance, the goal is to accurately estimate the decision boundary as quickly as possible. To generate models and predictions, scientists must choose their sampling locations from a vast array of possibilities. This thesis develops a policy for determining the optimal sample locations for a fixed number of samples.

The motivating scenario for this work is that of determining the spatial extent of particulate matter from a wildfire with an autonomous aerial vehicle. Algorithms designed to rapidly determine a decision boundary fall within the category of active learning or adaptive sampling. These methods typically try to maximize information gain per sample, but will be accompanied by potentially dramatic drawbacks in terms of sampling costs like distance or time. Meanwhile, state-of-the-art methods that balance the above costs by sampling a certain fraction into the remaining interval at each step do not guarantee to find the optimal search procedure.

For the first situation we consider, a one-dimensional step function, we propose a finite-horizon sampling procedure that optimally balances the distance traveled during the search with the final estimation entropy. We show the resulting cost for a uniform probability distribution with noiseless measurements can be optimized in closed form, and derive the expected number of samples necessary to fall below a given final estimation error. We show our method is suitable for both distance- and time-penalized search procedures, and demonstrate that the resulting policy generalizes and improves upon existing approaches to this problem. Empirical results demonstrate that our sampling strategy outperforms existing approaches by up to 35% and agrees with our analytical predictions in terms of the resulting distance traveled and average interval size.

To extend our proposed method to two-dimensional searches, we show how a series of sequential one-dimensional transect searches can be combined to estimate a spatial boundary, assuming we have some known statistics about the function we are modeling. We demonstrate how the results from each successive search can be used to update the estimated boundary and select where to start sampling for the next search. We also illustrate the tradeoff between the number of transect searches performed and the number of samples per search when constrained by a fixed number of total samples. We find that the optimal allocation lies somewhere between the maximum number of steps and maximum number of samples.

Finally, we introduce and implement four popular methods for solving reinforcement learning problems: dynamic programming, Q-learning, deep Q networks, and rollout. Starting with a uniform distribution on the change point of a step function, we show how formalizing our search as a Markov decision process yields an optimal policy through model-based dynamic programming, which we benchmark our three model-free algorithms against. We then approach a scenario with a more complicated model, where the change points are drawn from a nonuniform distribution. In both scenarios, rollout is the fastest model-free method while a deep Q network performs best. All algorithms improve upon existing approaches, ranging from 4% to 23% improvement.

Considering future work, we propose a number of algorithmic extensions and improvements to our models, as well as a few considerations for potential further investigation. The contributions of this thesis are an optimal search policy for a distance- or time-penalized one-dimensional search, an extension of this policy to a two-dimensional boundary, and the use of reinforcement learning methods to derive optimal policies without prior knowledge of the change point's distribution.


In Copyright. URI: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier