First Advisor

Feng Liu

Term of Graduation

Fall 2022

Date of Publication


Document Type


Degree Name

Doctor of Philosophy (Ph.D.) in Computer Science


Computer Science





Physical Description

1 online resource (xi, 113 pages)


Deep neural networks have been part of many breakthroughs in computer graphics and vision research. In the context of visual content synthesis, deep learning models have achieved impressive performance in the image domain. However, adapting the successes of image synthesis models to the video domain has been difficult, arguably due to the lack of sufficiently strong inductive biases that encourage the models to capture the temporal-dynamic nature of video data. Inductive bias refers to the prior knowledge incorporated into the learning models to explicitly drives the learning process toward the solutions that capture meaningful structures from data, which is critical to help the model generalize beyond the training data. Successful deep neural network architectures, such as convolutional neural networks (CNN), while effective in representing image data thanks to the spatial inductive bias, often lack the inductive biases relating to the dynamic nature of videos. I argue that designing such inductive biases can benefit from the domain knowledge of video processing literature. My primary motivation in this thesis is to demonstrate that the knowledge acquired from traditional computer vision and graphics literature can serve as effective inductive biases for designing deep learning models for video synthesis. This dissertation provides the initial steps toward verifying that insight via two case studies.

In the first case study, I explored adapting the standard CNN architecture to perform video frame interpolation. Early CNN-based methods for frame generation followed the direct prediction approach, thus ineffective in learning to capture motion information. That often results in visual distortions and blurry results. Inspired by traditional video frame interpolation techniques that established frame interpolation as a joint process of motion estimation and pixel re-sampling, I presented our CNN-based frame interpolation framework that incorporated such insight into the synthesis model via the novel AdaConv layer. That serves as a functional inductive bias and enables the first deep learning model for high-quality video frame interpolation.

In the second case study, I explored adapting the recent Implicit Neural Representation (INR) to a novel motion-adjustable video representation. Viewing modern INR frameworks as a form of non-linear transform from a frequency domain to the image domain, and inspired by the success of phase-based motion modelling in the classical computer vision literature, I presented a simple modification to the standard image-based INR model that allows for not only video reconstruction but also a variety of motion editing tasks.


In Copyright. URI: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier