SinGRAV: Learning a Generative Radiance Volume from a Single Natural Scene

Yujie Wang^1,3

Xuelin Chen²

Baoquan Chen³

¹Shandong University

²Tencent AI Lab

³Peking University

Code [Github]

arXiv [Paper]

Top two rows: From observations of a single natural scene, we learn a generative model to synthesize highly plausible variations. Three views rendered from each of input/generated scenes are presented. Note how the global and object configurations vary in generated samples, yet still resemble the original. Bottom: Applications enabled by SinGRAV, including removal (left) and duplicate (middle) operation for editing a 3D scene sample, and scene composition (right) that combines 5 different generated samples to form a novel complex scene.

Abstract

We present a 3D generative model for general natural scenes. Lacking necessary volumes of 3D data characterizing the target scene, we propose to learn from a single scene. Our key insight is that a natural scene often contains multiple constituents whose geometry, texture, and spatial arrangements follow some clear patterns, but still exhibit rich variations over different regions within the same scene. This suggests localizing the learning of a generative model on substantial local regions. Hence, we exploit a multi-scale convolutional network, which possesses the spatial locality bias in nature, to learn from the statistics of local regions at multiple scales within a single scene. In contrast to existing methods, our learning setup bypasses the need to collect data from many homogeneous 3D scenes for learning common features. We coin our method SinGRAV, for learning a Generative RAdiance Volume from a Single natural scene. We demonstrate the ability of SinGRAV in generating plausible and diverse variations from a single scene, the merits of SinGRAV over state-of-the-art generative neural scene methods, as well as the versatility of SinGRAV by its use in a variety of applications, spanning 3D scene editing, composition, and animation.

Supplementary Video

In the video, we provide the qualitative results for the generated scenes from SinGRAV and SinGRAV-derived variants and demonstrate the results on applications including scene editing, composition and animation.

Acknowledgements

We would like to thank Dr. Zhuang for helpful discussions.

Website template from Pix2latent.