* Work done as part of Haixin's Master thesis.
We propose an approach for reconstructing free-moving object from a monocular RGB video. Most existing methods either assume scene prior, hand pose prior, object category pose prior, or rely on local optimization with multiple sequence segments. We propose a method that allows free interaction with the object in front of a moving camera without relying on any prior, and optimizes the sequence globally without any segments. We progressively optimize the object shape and pose simultaneously based on an implicit neural representation. A key aspect of our method is a virtual camera system that reduces the search space of the optimization significantly. We evaluate our method on the standard HO3D dataset and a collection of egocentric RGB sequences captured with a head-mounted device. We demonstrate that our approach outperforms most methods significantly, and is on par with recent techniques that assume prior information.
Our method optimizes object shape, color, and pose progressively without any segments, which produces globally-consistent results and outperforms state of the art.
Pose results (GT) | Pose trajectories | Hampali's meshes | Our meshes |
Most objects in HO3D are captured with a fixed camera and manipulated by one hand with a fixed grasping style. To verify the generalization capabilities of our methd, we collect sequences in a more general setting involving free-moving objects with a head-mounted device (Magic Leap 2), where the objects are manipulated by both hands with a free manipulation style. While the standard joint optimization method BARF typically fails, our method produces accurate results for most objects.
Pose results | Pose trajectories | BARF's meshes | Our meshes |
If you find this work useful in your research, please consider citing:
@article{shi2024fmov,
author = {Shi, Haixin and Hu, Yinlin and Koguciuk, Daniel and Lin, Juan-Ting and Salzmann, Mathieu and Ferstl, David},
title = {Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera},
journal = {arXiv},
year = {2024},
}