VIGA: Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

About • Supported Domains • Quickstart • Documentation • Citation

About

VIGA is an analysis-by-synthesis code agent for programmatic visual reconstruction. It approaches vision-as-inverse-graphics through an iterative loop of generating, rendering, and verifying scenes against target images.

A single self-reflective agent alternates between two roles:

Generator — Writes and executes scene programs using tools for planning, code execution, asset retrieval, and scene queries.
Verifier — Examines rendered output from multiple viewpoints, identifies visual discrepancies, and provides feedback for the next iteration.

The agent maintains an evolving contextual memory with plans, code diffs, and render history. This write-run-compare-revise loop is self-correcting and requires no finetuning.

Supported Domains

Mode	Description	Output
BlenderBench	Multi-step 3D graphics editing (Level 1-3)	Blender Python
BlenderGym	Single-step 3D graphics editing	Blender Python
SlideBench	2D slide/document layout synthesis	PowerPoint
Custom Static Scene	Single-view 3D reconstruction	Blender scene
Custom Dynamic Scene	4D dynamic scene with physics	Blender animation

Quickstart

1. Installation: Setup the environment

Prerequisites

You need Conda installed. For 3D modes, an NVIDIA GPU with CUDA support is recommended.

Clone repository

git clone https://github.com/Fugtemypt123/VIGA-release.git && cd VIGA-release
git submodule update --init --recursive
# download sam module
wget -P utils/third_party/sam https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

Create conda environments

VIGA requires separate environments for the agent and tools.

conda create -n agent python=3.10 -y && conda activate agent
pip install -r requirements/requirement_agent.txt

conda create -n blender python=3.11 -y && conda activate blender
pip install -r requirements/requirement_blender.txt
cd utils/third_party/infinigen
INFINIGEN_MINIMAL_INSTALL=True bash scripts/install/interactive_blender.sh # You can ignore the errors as long as you can see `utils/third_party/infinigen/blender`

conda create -n sam python=3.10 -y && conda activate sam
pip install -r requirements/requirement_sam.txt

conda create -n sam3d python=3.11 -y && conda activate sam3d
./requirements/install_sam3d.sh

See Requirements for additional options.

Configure API keys

cp utils/_api_keys.py.example utils/_api_keys.py

Edit utils/_api_keys.py and add your OPENAI_API_KEY and MESHY_API_KEY.

Configure environment paths

cp utils/_path.py.example utils/_path.py

Edit utils/_path.py to set your conda installation path. By default, it points to ~/anaconda3/envs. Update the CONDA_BASE variable or set the VIGA_CONDA_BASE environment variable to match your conda environments location.

2. Usage: Run the agent

conda activate agent
python runners/dynamic_scene.py --task=artist --model=gpt-5 --generator-tools=tools/blender/exec.py,tools/generator_base.py,tools/initialize_plan.py,tools/sam3d/init.py --prompt-setting=init

Custom data: place in data/dynamic_scene/<your-data-name> following the format in data/dynamic_scene/artist.

Documentation

Doc	Description
Architecture	System design and agent tools
Requirements	Conda environment setup
Runners	Batch execution options

Citation

You can find a paper writeup of the framework on arXiv.

If you find this project useful for your research, please consider citing:

@misc{yin2026visionasinversegraphicsagentinterleavedmultimodal,
      title={Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning},
      author={Shaofeng Yin and Jiaxin Ge and Zora Zhiruo Wang and Xiuyu Li and Michael J. Black and Trevor Darrell and Angjoo Kanazawa and Haiwen Feng},
      year={2026},
      eprint={2601.11109},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.11109},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agents		agents
data		data
docs		docs
evaluators		evaluators
models		models
prompts		prompts
requirements		requirements
runners		runners
tools		tools
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VIGA: Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

About

Supported Domains

Quickstart

1. Installation: Setup the environment

Prerequisites

Clone repository

Create conda environments

Configure API keys

Configure environment paths

2. Usage: Run the agent

Documentation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

VIGA: Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

About

Supported Domains

Quickstart

1. Installation: Setup the environment

Prerequisites

Clone repository

Create conda environments

Configure API keys

Configure environment paths

2. Usage: Run the agent

Documentation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages