Hand-4DGS: Feed-Forward 3D Gaussian Splatting for 4D Hand Reconstruction from Egocentric Videos

Jeongmin Bae^1,†, Seoha Kim^2,†, Marc Pollefeys^3,4, Mahdi Rad⁴, Youngjung Uh^1,‡, Taein Kwon^5,‡

¹Yonsei University, ²Electronics and Telecommunications Research Institute, ³ETH Zurich, ⁴Microsoft Spatial AI Lab, ⁵Visual Geometry Group, University of Oxford

^† Equal contribution ^‡ Corresponding authors

Arxiv 2026

arXiv (Coming Soon) Code (Coming Soon)

Abstract

Hand-4DGS is a feed-forward 3D Gaussian Splatting framework for reconstructing dynamic 4D hands from single-view egocentric videos. By combining a mesh-guided hand representation with temporal modeling, it enables fast inference, strong generalization to unseen videos, accurate hand reconstruction, and improved hand pose estimation on challenging H2O and ARCTIC settings.

Video Results

We evaluate two settings: Hand-4DGS (Per-Video), which trains directly on each target video for comparison with scene-specific optimization baselines, and Hand-4DGS (Generalized), which trains on other sequences and tests on unseen videos.

H2O Dataset

ARCTIC Dataset

Novel View Synthesis

Despite training on single-view egocentric videos, Hand-4DGS reconstructs accurate hand geometry from novel viewpoints, while baselines fail to maintain hand structure.

The framework achieves improved visual quality on subject4_h1 sequences. The generalized model is evaluated on unseen sequences.

Hand Pose Estimation

Hand-4DGS improves upon initial pose estimates from HaMeR, while baseline methods degrade below the initial accuracy.

Predicted hand vertices are shown in blue and ground truth in red. Hand-4DGS maintains consistent alignment across frames, while baselines show drift and misalignment.