First Advisor

Feng Liu

Term of Graduation

Winter 2025

Date of Publication

3-13-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (Ph.D.) in Computer Science

Department

Computer Science

Language

English

Subjects

4D Gaussians, Computer Graphics, Computer Vision, Novel view thesis, Spacetime Gaussians, Video Prediction

Physical Description

1 online resource (xi, 94 pages)

Abstract

Efficient visual synthesis has played an indirect yet important role in the development of neural networks. Graphics processing units (GPUs) were initially designed for visual synthesis tasks such as rendering graphics. However, their parallel computing capabilities later made GPUs the ideal hardware for training neural networks. Since neural networks have greatly benefited from GPUs, a compelling question arises: How can neural networks, in turn, improve efficient visual synthesis?

Existing visual synthesis methods still require improvement for higher efficiency. For example, Monte Carlo rendering takes over 15 minutes to generate a megapixel frame at a high-quality setting on a GPU, as computing thousands of samples per pixel is time-consuming yet essential for an offline path tracer. This thesis presents three technical contributions based on neural networks for efficient visual synthesis. We explore the performance improvement that neural networks can bring to different applications of visual synthesis, including graphics, 4D Gaussians, and videos.

First, we propose speeding up Monte Carlo rendering by significantly reducing the number of pixels required for sampling. We introduce a neural network to predict future frames instead of rendering future frames. For each predicted future frame, certain pixels may not be accurately projected from existing frames due to challenging conditions, such as quick motion and large occlusion. Therefore, our method estimates a mask together with each subsequent frame. The estimated mask indicates the particular pixels requiring ray sampling to correct prediction results.

Second, we introduce a feature splatting method based on 4D Gaussians, named Spacetime Gaussians. It is composed of three important components for dynamic view synthesis in a volumetric rendering way. The proposed Spacetime Gaussian is formulated by adapting 3D Gaussians with temporal opacity and parametric motion/rotation. This simple yet clean formulation enhances Spacetime Gaussian to capture static, dynamic, as well as transient content within a scene. Besides a new representation, we develop splatted feature rendering by replacing spherical harmonics with neural features. The neural features improve the rendering of appearance, which varies with viewpoint and time, while keeping the model compact. Additionally, with the guidance of training error and coarse depth, we sample new Gaussians in areas where current pipelines struggle to converge.

Last, we apply neural visual synthesis to the general video prediction task. To produce high-quality results, we explore a trending architecture, Vision Transformers (ViTs). ViTs have made great progress in many low-level computer vision tasks like frame interpolation. However, video prediction is more challenging than video frame interpolation. Additionally, existing ViTs for video prediction are computationally expensive. Hence, we develop an efficient transformer for video prediction. We decouple temporal and channel features and employ transposed attention to reduce computation. We also implement a global query strategy to capture global information for motion. A depth shift module is introduced to better integrate cross-channel and temporal information, enabling efficient and high-quality video prediction.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

Persistent Identifier

https://archives.pdx.edu/ds/psu/43150

Recommended Citation

Li, Zhan, "Neural Visual Synthesis for Graphics, 4D Gaussians and Videos" (2025). Dissertations and Theses. Paper 6775.

chapt-3.mp4 (191630 kB)
Supplementary material for Chapter 3
chapt-4.pdf (13115 kB)
Supplementary material for Chapter 4
chapt-5.mp4 (77772 kB)
Supplementary material for Chapter 5

Download

COinS

Dissertations and Theses

Neural Visual Synthesis for Graphics, 4D Gaussians and Videos

First Advisor

Term of Graduation

Date of Publication

Document Type

Degree Name

Department

Language

Subjects

Physical Description

Abstract

Rights

Persistent Identifier

Recommended Citation

Find

Connect

Dissertations and Theses

Neural Visual Synthesis for Graphics, 4D Gaussians and Videos

Author

Sponsor

First Advisor

Term of Graduation

Date of Publication

Document Type

Degree Name

Department

Language

Subjects

Physical Description

Abstract

Rights

Persistent Identifier

Recommended Citation

Share

Find

Connect