Hand-4DGS: Feed-Forward 3D Gaussian Splatting for 4D Hand Reconstruction from Egocentric Videos

Jeongmin Bae1,†, Seoha Kim2,†, Marc Pollefeys3,4, Mahdi Rad4, Youngjung Uh1,‡, Taein Kwon5,‡
1Yonsei University, 2Electronics and Telecommunications Research Institute, 3ETH Zurich, 4Microsoft Spatial AI Lab, 5Visual Geometry Group, University of Oxford
Equal contribution Corresponding authors

Arxiv 2026

Abstract

Hand-4DGS is a feed-forward 3D Gaussian Splatting framework for reconstructing dynamic 4D hands from single-view egocentric videos. By combining a mesh-guided hand representation with temporal modeling, it enables fast inference, strong generalization to unseen videos, accurate hand reconstruction, and improved hand pose estimation on challenging H2O and ARCTIC settings.

Video Results

We evaluate two settings: Hand-4DGS (Per-Video), which trains directly on each target video for comparison with scene-specific optimization baselines, and Hand-4DGS (Generalized), which trains on other sequences and tests on unseen videos.


H2O Dataset

ARCTIC Dataset

Novel View Synthesis

Despite training on single-view egocentric videos, Hand-4DGS reconstructs accurate hand geometry from novel viewpoints, while baselines fail to maintain hand structure.

Novel view synthesis results

The framework achieves improved visual quality on subject4_h1 sequences. The generalized model is evaluated on unseen sequences.

Additional novel view synthesis results

Hand Pose Estimation

Hand-4DGS improves upon initial pose estimates from HaMeR, while baseline methods degrade below the initial accuracy.

Hand pose estimation quantitative results

Predicted hand vertices are shown in blue and ground truth in red. Hand-4DGS maintains consistent alignment across frames, while baselines show drift and misalignment.