First Advisor

Melanie Mitchell

Term of Graduation

Fall 2021

Date of Publication


Document Type


Degree Name

Doctor of Philosophy (Ph.D.) in Computer Science


Computer Science




Pattern recognition systems, Computer vision, Deep learning (Machine learning)



Physical Description

1 online resource (vi, 140 pages)


Computer vision and machine learning systems have improved significantly in recent years, largely based on the development of deep learning systems, leading to impressive performance on object detection tasks. Understanding the content of images is considerably more difficult. Even simple situations, such as "a handshake", "walking the dog", "a game of ping-pong", or "people waiting for a bus", present significant challenges. Each consists of common objects, but are not reliably detectable as a single entity nor through the simple co-occurrence of their parts.

In this dissertation, toward the goal of developing machine learning systems that demonstrate properties associated with understanding, I will describe a novel system for performing visual situation recognition. Given a description of a situation and a small labeled training set, the system, called Situate, learns object appearance models as well as a probabilistic model capturing the situation's expected spatial relationships. Given a new image, Situate uses its learned models and an array of agents to engage in an active search of its input to find the most consistent correspondence between the model of the situation and the content of the image. Each agent develops a possible correspondence between the model and the input, while Situate allocates computational resources to the agents such that promising solutions are developed early, but alternative correspondences are not ignored.

I will compare Situate to a more traditional computer vision approach that relies on the detection of constituent objects of a situation, as well as to a related image-retrieval system based on "scene graphs". I will evaluate each method on the situation recognition task and in the context of image retrieval. The results demonstrate the value of a feedback system between image content and a model of that content.


In Copyright. URI: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier