Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video

Part of International Conference on Representation Learning 2024 (ICLR 2024) Conference

Bibtex Paper Supplementary

Authors

Yanqin Jiang, Li Zhang, Jin Gao, Weiming Hu, Yao Yao

Abstract

In this paper, we present Consistent4D, a novel approach for generating 4D dynamic objects from uncalibrated monocular videos. Uniquely, we cast the 360-degree dynamic object reconstruction as a 4D generation problem, eliminating the need for tedious multi-view data collection and camera calibration. This is achieved by leveraging the object-level 3D-aware image diffusion model as the primary supervision signal for training dynamic Neural Radiance Fields (DyNeRF). Specifically, we propose a cascade DyNeRF to facilitate stable convergence and temporal continuity under the time-discrete supervision signal. To achieve spatial and temporal consistency of the 4D generation, an interpolation-driven consistency loss is further introduced, which aligns the rendered frames with the interpolated frames from a pre-trained video interpolation model. Extensive experiments show that the proposed Consistent4D significantly outperforms previous 4D reconstruction approaches as well as per-frame 3D generation approaches, opening up new possibilities for 4D dynamic object generation from a single-view uncalibrated video. Project page: https://consistent4d.github.io