Sponsor
Portland State University. Department of Computer Science
First Advisor
Feng Liu
Term of Graduation
Fall 2022
Date of Publication
11-17-2022
Document Type
Dissertation
Degree Name
Doctor of Philosophy (Ph.D.) in Computer Science
Department
Computer Science
Language
English
Subjects
Image processing -- Digital techniques, Computer vision, Deep learning (Machine learning)
DOI
10.15760/etd.8106
Physical Description
1 online resource (xi, 113 pages)
Abstract
Deep neural networks have been part of many breakthroughs in computer graphics and vision research. In the context of visual content synthesis, deep learning models have achieved impressive performance in the image domain. However, adapting the successes of image synthesis models to the video domain has been difficult, arguably due to the lack of sufficiently strong inductive biases that encourage the models to capture the temporal-dynamic nature of video data. Inductive bias refers to the prior knowledge incorporated into the learning models to explicitly drive the learning process toward the solutions that capture meaningful structures from data, which is critical to help the model generalize beyond the training data. Successful deep neural network architectures, such as convolutional neural networks (CNN), while effective in representing image data thanks to the spatial inductive bias, often lack the inductive biases relating to the dynamic nature of videos. Mai argues that designing such inductive biases can benefit from the domain knowledge of video processing literature. Their primary motivation in this thesis is to demonstrate that the knowledge acquired from traditional computer vision and graphics literature can serve as effective inductive biases for designing deep learning models for video synthesis. This dissertation provides the initial steps toward verifying that insight via two case studies.
In the first case study, Mai explored adapting the standard CNN architecture to perform video frame interpolation. Early CNN-based methods for frame generation followed the direct prediction approach, thus ineffective in learning to capture motion information. Inspired by traditional video frame interpolation techniques that established frame interpolation as a joint process of motion estimation and pixel re-sampling, Mai presented the CNN-based frame interpolation framework that incorporated such insight into the synthesis model via the novel AdaConv layer. That serves as a functional inductive bias and enables the first deep learning model for high-quality video frame interpolation.
In the second case study, Mai explored adapting the recent Implicit Neural Representation (INR) to a novel motion-adjustable video representation. Viewing modern INR frameworks as a form of non-linear transform from a frequency domain to the image domain, and inspired by the success of phase-based motion modelling in the classical computer vision literature, they presented a simple modification to the standard image-based INR model that allows for not only video reconstruction but also a variety of motion editing tasks.
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
Persistent Identifier
https://archives.pdx.edu/ds/psu/39181
Recommended Citation
Mai, Long, "Domain Knowledge as Motion-Aware Inductive Bias for Deep Video Synthesis: Two Case Studies" (2022). Dissertations and Theses. Paper 6247.
https://doi.org/10.15760/etd.8106