Causal Reasoning Elicits Controllable 3D Scene Generation
Shen Chen1, Ruiyu Zhao2, Zongkai Wu3, Jenq-Neng Hwang4, Serge Belongie5, Lei Li4,5
1 Zhejiang University 2 East China University of Science and Technology 3 skai worldwide 4 University of Washington 5 University of Copenhagen
Abstract
We propose CausalStruct, a novel framework integrating causal reasoning into 3D scene generation. It leverages large language models to construct causal graphs capturing object relationships and physical constraints. CausalStruct iteratively refines scenes using causal ordering and causal interventions. Guided by text or images, it employs a Proportional-Integral-Derivative controller alongside 3D Gaussian Splatting with Score Distillation Sampling for accurate shapes and stable rendering. Experiments demonstrate improved coherence, realistic interactions, and adaptability.
Key Contributions
1. Causal Reasoning: Integrates causal order and intervention to refine object interactions and ensure physical plausibility.
2. PID Control: Dynamically tunes object scales and positions to maintain spatial consistency.
3. Text and Image Guidance: Uses text or images to guide object placement and layout, enhancing realism and adaptability.
4. Scene Editing: Supports flexible and intuitive modifications based on causal relationships.
Method
Overview of our method. Given a scene description, our method constructs a causal scene graph using LLMs and MLLMs with causal reasoning. A PID controller refines object scales and positions, ensuring spatial consistency. Additionally, objects and the scene are represented with 3D Gaussian Splatting and optimized using Diffusion and SDS for high-fidelity rendering.
Comparison
Scene Editing
Knowledge Distillation