First Advisor

Feng Liu

Term of Graduation

Fall 2022

Date of Publication

11-17-2022

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.) in Computer Science

Department

Computer Science

Language

English

Subjects

Image processing -- Digital techniques, Computer vision, Deep learning (Machine learning)

DOI

10.15760/etd.8106

Physical Description

1 online resource (xi, 113 pages)

Abstract

Deep neural networks have been part of many breakthroughs in computer graphics and vision research. In the context of visual content synthesis, deep learning models have achieved impressive performance in the image domain. However, adapting the successes of image synthesis models to the video domain has been difficult, arguably due to the lack of sufficiently strong inductive biases that encourage the models to capture the temporal-dynamic nature of video data. Inductive bias refers to the prior knowledge incorporated into the learning models to explicitly drive the learning process toward the solutions that capture meaningful structures from data, which is critical to help the model generalize beyond the training data. Successful deep neural network architectures, such as convolutional neural networks (CNN), while effective in representing image data thanks to the spatial inductive bias, often lack the inductive biases relating to the dynamic nature of videos. Mai argues that designing such inductive biases can benefit from the domain knowledge of video processing literature. Their primary motivation in this thesis is to demonstrate that the knowledge acquired from traditional computer vision and graphics literature can serve as effective inductive biases for designing deep learning models for video synthesis. This dissertation provides the initial steps toward verifying that insight via two case studies.

In the first case study, Mai explored adapting the standard CNN architecture to perform video frame interpolation. Early CNN-based methods for frame generation followed the direct prediction approach, thus ineffective in learning to capture motion information. Inspired by traditional video frame interpolation techniques that established frame interpolation as a joint process of motion estimation and pixel re-sampling, Mai presented the CNN-based frame interpolation framework that incorporated such insight into the synthesis model via the novel AdaConv layer. That serves as a functional inductive bias and enables the first deep learning model for high-quality video frame interpolation.

In the second case study, Mai explored adapting the recent Implicit Neural Representation (INR) to a novel motion-adjustable video representation. Viewing modern INR frameworks as a form of non-linear transform from a frequency domain to the image domain, and inspired by the success of phase-based motion modelling in the classical computer vision literature, they presented a simple modification to the standard image-based INR model that allows for not only video reconstruction but also a variety of motion editing tasks.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier

https://archives.pdx.edu/ds/psu/39181

Recommended Citation

Mai, Long, "Domain Knowledge as Motion-Aware Inductive Bias for Deep Video Synthesis: Two Case Studies" (2022). Dissertations and Theses. Paper 6247.
https://doi.org/10.15760/etd.8106

Download

COinS

Dissertations and Theses

Domain Knowledge as Motion-Aware Inductive Bias for Deep Video Synthesis: Two Case Studies

First Advisor

Term of Graduation

Date of Publication

Document Type

Degree Name

Department

Language

Subjects

DOI

Physical Description

Abstract

Rights

Persistent Identifier

Recommended Citation

Find

Connect

Dissertations and Theses

Domain Knowledge as Motion-Aware Inductive Bias for Deep Video Synthesis: Two Case Studies

Author

Sponsor

First Advisor

Term of Graduation

Date of Publication

Document Type

Degree Name

Department

Language

Subjects

DOI

Physical Description

Abstract

Rights

Persistent Identifier

Recommended Citation

Share

Find

Connect