Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference
Ziqi Lu, Heng Yang, Danfei Xu, Boyi Li, Boris Ivanovic, Marco Pavone, Yue Wang
Emerging 3D geometric foundation models, such as DUSt3R, offer a promising approach for in-the-wild 3D vision tasks.However, due to the high-dimensional nature of the problem space and scarcity of high-quality 3D data,these pre-trained models still struggle to generalize to many challenging circumstances,such as limited view overlap or low lighting.To address this, we propose LoRA3D, an efficient self-calibration pipeline to specialize the pre-trained models to target scenes using their own multi-view predictions.Taking sparse RGB images as input, we leverage robust optimization techniques to refine multi-view predictions and align them into a global coordinate frame.In particular, we incorporate prediction confidence into the geometric optimization process, automatically re-weighting the confidence to better reflect point estimation accuracy. We use the calibrated confidence to generate high-quality pseudo labels for the calibrating views and fine-tune the models using low-rank adaptation (LoRA) on the pseudo-labeled data.Our method does not require any external priors or manual labels. It completes the self-calibration process on a single standard GPU within just 5 minutes.Each low-rank adapter requires only 18MB of storage. We evaluated our method on more than 160 scenes from the Replica, TUM and Waymo Open datasets,achieving up to 88\% performance improvement on 3D reconstruction, multi-view pose estimation and novel-view rendering.For more details, please visit our project page at https://520xyxyzq.github.io/lora3d/.