View-consistent Object Removal in Radiance Fields

Case Western Reserve University
ACM MM 2024

Abstract

MY ALT TEXT

Radiance Fields (RFs) have emerged as a crucial technology for 3D scene representation, enabling the synthesis of novel views with remarkable realism. However, as RFs become more widely used, the need for effective editing techniques that maintain coherence across different perspectives becomes evident. Current methods primarily depend on per-frame 2D image inpainting, which often fails to maintain consistency across views, thus compromising the realism of edited RF scenes. In this work, we introduce a novel RF editing pipeline that significantly enhances consistency by requiring the inpainting of only a single reference image. This image is then projected across multiple views using a depth-based approach, effectively reducing the inconsistencies observed with per-frame inpainting. However, projections typically assume photometric consistency across views, which is often impractical in real-world settings. To accommodate realistic variations in lighting and viewpoint, our pipeline adjusts the appearance of the projected views by generating multiple directional variants of the inpainted image, thereby adapting to different photometric conditions. Additionally, we present an effective and robust multi-view object segmentation approach as a valuable byproduct of our pipeline. Extensive experiments demonstrate that our method significantly surpasses existing frameworks in maintaining content consistency across views and enhancing visual quality.

Inpainting Pipeline

MY ALT TEXT

An overview of our proposed pipeline. We initiate our methodology by selecting a reference camera pose from the training dataset; this camera pose is identified as having the minimal average distance to all other poses on the SE(3) manifold. The processing of the chosen reference view involves three key steps: masking, inpainting, and depth estimation, yielding three outputs: the mask, the inpainted image, and the depth map, respectively. These outputs are then used for multi-view projection, yielding a set of inpainted images from multiple views. Finally, an inpainted Radiance Field will be trained using these inpainted images.

Scene Object Removal Results



Comparison with Previous Methods

Original Scene
SPIn-NeRF
NeRFiller
Ours

Multi-view Consistency

Here we showcase how our pipeline ensures the multi-view consistency by evaluating the number of keypoint matchings across different rendered views. These keypoint matchings are given by SuperGlue, and we only consider the matchings within the inpainted region. As evident in the results, our pipeline has significantly more results within the inpainted region, which indicates a better multi-view consistency.

SPIn-NeRF

NeRFiller

Ours

Mask Consistency

Our multi-view segmentation approach can not only maintain mask consistency across different views, but can also take regions without semantic meanings into consideration (i.e. the shadow on the left side of the book).

Original Scene

Masks

Citation

If you find our work helpful, please consider cite us:

@misc{lu2024viewconsistentobjectremovalradiance,
          title={View-consistent Object Removal in Radiance Fields}, 
          author={Yiren Lu and Jing Ma and Yu Yin},
          year={2024},
          eprint={2408.02100},
          archivePrefix={arXiv},
          primaryClass={cs.CV},
          url={https://arxiv.org/abs/2408.02100}, 
    }