Exploiting Semantic Scene Reconstruction for Estimating Building Envelope Characteristics

1EPFL, 2Schindler EPFL Lab
Interpolate start reference image.

architecture of 3D semantic building envelope reconstruction based on SDF representation.

Abstract

Precise assessment of geometric building envelope characteristics is essential for parametric simulation analysis and informed retrofitting decisions. Previous methods for estimating building characteristics, such as window-to-wall ratio and building footprint area, primarily focus on planar properties from single images, limiting the accuracy and comprehensiveness required for complete 3D building envelope analysis.

To address this limitation, we introduce BuildNet3D, a novel framework that leverages advanced neural surface reconstruction techniques based on signed distance function (SDF) representations for estimating geometric building characteristics. BuildNet3D integrates SDF representations with semantic modalities to recover fine-grained 3D geometry and semantics of building envelopes directly from 2D image inputs.

Evaluations on complex synthetic and real-world building structures demonstrate its superior geometry reconstruction performance and higher accuracy in estimating window-to-wall ratios and building footprints compared to 2D methods. These results underscore the effectiveness of incorporating 3D representations to advance building envelope modeling, characteristic prediction, and practical applications in building analysis and retrofitting.

Video

Visual Effects

Using nerfies you can create fun visual effects. This Dolly zoom effect would be impossible without nerfies since it would require going through a wall.

Matting

As a byproduct of our method, we can also solve the matting problem by ignoring samples that fall outside of a bounding box during rendering.

Animation

Interpolating states

We can also animate the scene by interpolating the deformation latent codes of two input frames. Use the slider here to linearly interpolate between the left frame and the right frame.

Interpolate start reference image.

Start Frame

Loading...
Interpolation end reference image.

End Frame


Re-rendering the input video

Using Nerfies, you can re-render a video from a novel viewpoint such as a stabilized camera by playing back the training deformations.

Related Links

There's a lot of excellent work that was introduced around the same time as ours.

Progressive Encoding for Neural Optimization introduces an idea similar to our windowed position encoding for coarse-to-fine optimization.

D-NeRF and NR-NeRF both use deformation fields to model non-rigid scenes.

Some works model videos with a NeRF by directly modulating the density, such as Video-NeRF, NSFF, and DyNeRF

There are probably many more by the time you are reading this. Check out Frank Dellart's survey on recent NeRF papers, and Yen-Chen Lin's curated list of NeRF papers.

BibTeX

@artical{xu2024buildnet3d,
      title={Exploiting Semantic Scene Reconstruction for Estimating Building Envelope Characteristics}, 
      author={Chenghao Xu and Malcolm Mielle and Antoine Laborde and Ali Waseem and Florent Forest and Olga Fink},
      year={2025},
      journal={Building and Environment},
      eprint={2410.22383},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.22383}, 
}