EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference

Bibtex Paper

Authors

Kaizhi Zheng, Xiaotong Chen, Xuehai He, Jing Gu, Linjie Li, Zhengyuan Yang, Kevin Lin, Jianfeng Wang, Lijuan Wang, Xin Wang

Abstract

Given the steep learning curve of professional 3D software and the time-consuming process of managing large 3D assets, language-guided 3D scene editing has significant potential in fields such as virtual reality, augmented reality, andgaming. However, recent approaches to language-guided 3D scene editing eitherrequire manual interventions or focus only on appearance modifications withoutsupporting comprehensive scene layout changes. In response, we propose EditRoom, a unified framework capable of executing a variety of layout edits throughnatural language commands, without requiring manual intervention. Specifically,EditRoom leverages Large Language Models (LLMs) for command planning andgenerates target scenes using a diffusion-based method, enabling six types of edits: rotate, translate, scale, replace, add, and remove. To addressthe lack of data for language-guided 3D scene editing, we have developed an automatic pipeline to augment existing 3D scene synthesis datasets and introducedEditRoom-DB, a large-scale dataset with 83k editing pairs, for training and evaluation. Our experiments demonstrate that our approach consistently outperformsother baselines across all metrics, indicating higher accuracy and coherence inlanguage-guided scene layout editing.