Video frame interpolation typically involves two steps: motion estimation and pixel synthesis. Such a two-step ap- proach heavily depends on the quality of motion estima- tion. This paper presents a robust video frame interpo- lation method that combines these two steps into a single process. Specifically, our method considers pixel synthe- sis for the interpolated frame as local convolution over two input frames. The convolution kernel captures both the lo- cal motion between the input frames and the coefficients for pixel synthesis. Our method employs a deep fully convolu- tional neural network to estimate a spatially-adaptive con- volution kernel for each pixel. This deep neural network can be directly trained end to end using widely available video data without any difficult-to-obtain ground-truth data like optical flow. Our experiments show that the formula- tion of video interpolation as a single convolution process allows our method to gracefully handle challenges like oc- clusion, blur, and abrupt brightness change and enables high-quality video frame interpolation.
Niklaus, S., Mai, L., & Liu, F. (2017). Video Frame Interpolation via Adaptive Convolution. arXiv preprint arXiv:1703.07514.