IntrinsicAvatar

Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing

Given a monocular video, IntrinsicAvatar learns animatable clothed human avatars with decomposed intrinsic properties including albedo, material, and geometry.

IntrinsicAvatar employs volumetric scattering and explicit ray tracing to learn relightable and animatable avatars from monocular videos. Learning an avatar can be done in less than 4 hours while the learned avatars can be rendered under novel poses and lighting conditions.

teaser



Overview

Given an input image and associated camera rays, we warp the rays to the canonical space and do both primary and secondary ray marching/tracing in canonical space. We model geometry with a geometry hash grid \( \gamma_g \) and MLP \( f_g \), while also modeling volumetric radiance and material with an appearance grid \( \gamma_c \) and two additional MLPs \( f_{rf}, f_m \). We supervise both \( C_{rf} \) and \( C_{pbr} \) using a L1 loss w.r.t. the input image. Importantly, we model the physically based inverse rendering process with volumetric scattering and explict secondary ray tracing.

method overview


Volumetric Scattering

Instead of following the standard rendering equation that was designed specifically for surface rendering, we trace back the root of the popular neural radiance field, which is the radiative transfer equation. We formulate the inverse rendering problem as a volumetric scattering process, which is shown to be more robust than surface rendering when abrupt depth changes are present.

method overview



Video



Results

Comparison to the baseline



More results on animation and relighting of our approach

Publication


Shaofei Wang, Božidar Antić, Andreas Geiger, Siyu Tang
IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

@inproceedings{WangCVPR2024,
         title   = {IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing},
         author  = {Shaofei Wang and Bo\v{z}idar Anti\'{c} and Andreas Geiger and Siyu Tang},
         booktitle = {Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
         year    = {2024}
}