Name	Name	Last commit message	Last commit date
Latest commit History 108 Commits
.dev_scripts	.dev_scripts
.github	.github
assets	assets
data_preparation	data_preparation
mmscan	mmscan
models	models
requirements	requirements
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
README.md	README.md
install.py	install.py
setup.py	setup.py

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

🤖 Demo

🔥 News

[2025-06] We are co-organizing the CVPR 2025 3D Scene Understanding Challenge. You're warmly invited to participate in the MMScan Hierarchical Visual Grounding track! The challenge test server is now online here. We look forward to your strong submissions!
[2025-01] We are delighted to present the official release of MMScan-devkit, which encompasses a suite of data processing utilities, benchmark evaluation tools, and adaptations of some models for the MMScan benchmarks. We invite you to explore these resources and welcome any feedback or questions you may have!

🏠 About

With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous works mainly focus on understanding object properties or inter-object spatial relationships in a 3D scene. To tackle this problem, this paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan. It is constructed based on a top-down logic, from region to object level, from a single target to inter-target relation ships, covering holistic aspects of spatial and attribute understanding. The overall pipeline incorporates powerful VLMs via carefully designed prompts to initialize the annotations efficiently and further involve humans’ correction in the loop to ensure the annotations are natural, correct, and comprehensive. Built upon exist ing 3D scanning data, the resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks. We evaluate representative baselines on our benchmarks, analyze their capabilities in different aspects, and showcase the key problems to be addressed in the future. Furthermore, we use this high-quality dataset to train state-of-the-art 3D visual grounding and LLMs and obtain remarkable performance improvement both on existing benchmarks and in-the-wild evaluation.

🚀 Getting Started

Installation

Clone Github repo.

git clone [email protected]:rbler1234/MMScan.git
cd MMScan

Install requirements.

Your environment needs to include Python version 3.8 or higher.
```
conda activate your_env_name
python intall.py all/VG/QA
```
Use "all" to install all components and specify "VG" or "QA" if you only need to install the components for Visual Grounding or Question Answering, respectively.

Data Preparation

Download the Embodiedscan and MMScan annotation. (Fill in the form to apply for downloading)

Create a folder mmscan_data/ and then unzip the files. For the first zip file, put embodiedscan under mmscan_data/embodiedscan_split and rename it to embodiedscan-v1. For the second zip file, put MMScan-beta-release under mmscan_data/MMScan-beta-release and embodiedscan-v2 under mmscan_data/embodiedscan_split.

The directory structure should be as below:
```
mmscan_data
├── embodiedscan_split
│   ├──embodiedscan-v1/   # EmbodiedScan v1 data in 'embodiedscan.zip'
│   ├──embodiedscan-v2/   # EmbodiedScan v2 data in 'embodiedscan-v2-beta.zip'
├── MMScan-beta-release   # MMScan data in 'embodiedscan-v2-beta.zip'
```
Prepare the point clouds files.

Please refer to the guide here.

👓 MMScan Tutorial

The MMScan Toolkit provides comprehensive tools for dataset handling and model evaluation in tasks.

MMScan Dataset

The dataset tool in MMScan allows seamless access to data required for various tasks within MMScan.

Usage

Initialize the dataset for a specific task with:

from mmscan import MMScan

# (1) The dataset tool
my_dataset = MMScan(split='train'/'test'/'val', task='MMScan-VG'/'MMScan-QA')
# Access a specific sample
print(my_dataset[index])

Note: For the test split, we have only made the VG portion publicly available, while the QA portion has not been released.

Data Access

Each dataset item is a dictionary containing data information from three modalities: language, 2D, and 3D.（Details）

MMScan Evaluation

Our evaluation tool is designed to streamline the assessment of model outputs for the MMScan task, providing essential metrics to gauge model performance effectively. We provide three evaluation tools: VisualGroundingEvaluator, QuestionAnsweringEvaluator, and GPTEvaluator. For more details, please refer to the documentation.

from mmscan import MMScan

# (2) The evaluator tool ('VisualGroundingEvaluator', 'QuestionAnsweringEvaluator', 'GPTEvaluator')
from mmscan import VisualGroundingEvaluator, QuestionAnsweringEvaluator, GPTEvaluator

# For VisualGroundingEvaluator and QuestionAnsweringEvaluator, initialize the evaluator in the following way, update the model output to the evaluator, and finally perform the evaluation and save the final results.
my_evaluator = VisualGroundingEvaluator(show_results=True) / QuestionAnsweringEvaluaton(show_results=True)
my_evaluator.update(model_output)
metric_dict = my_evaluator.start_evaluation()

# For GPTEvaluator, initialize the Evaluator in the following way, and evaluate the model's output using multithreading, finally saving the results to the specified path (tmp_path).
gpt_evaluator = GPTEvaluator(API_key='XXX')
metric_dict = gpt_evaluator.load_and_eval(model_output, num_threads=1, tmp_path='XXX')

MMScan HVG Challenge Submission

To participate in the MMScan Visual Grounding Challenge and submit your results, please follow the instructions available on our test server. We welcome your feedback and inquiries—please feel free to contact us at [email protected].

🏆 MMScan Benchmark

MMScan Visual Grounding Benchmark

Methods	gTop-1	gTop-3	AP_sample	AP_box	AR	Release	Download
ScanRefer	4.74	9.19	9.49	2.28	47.68	code	model
MVT	7.94	13.07	13.67	2.50	86.86	-	-
BUTD-DETR	15.24	20.68	18.58	9.27	66.62	-	-
ReGround3D	16.35	26.13	22.89	5.25	43.24	-	-
EmbodiedScan	19.66	34.00	29.30	15.18	59.96	code	model
3D-VisTA	25.38	35.41	33.47	6.67	87.52	-	-
ViL3DRef	26.34	37.58	35.09	6.65	86.86	-	-

MMScan Question Answering Benchmark

Methods	Overall	ST-attr	ST-space	OO-attr	OO-space	OR	Advanced	Release	Download
LL3DA	45.7	39.1	58.5	43.6	55.9	37.1	24.0	code	model
LEO	54.6	48.9	62.7	50.8	64.7	50.4	45.9	code	model
LLaVA-3D	61.6	58.5	63.5	56.8	75.6	58.0	38.5	-	-

Note: These two tables only show the results for main metrics; see the paper for complete results.

We have released the codes of some models under ./models.

📝 TODO List

MMScan annotation and samples for ARKitScenes.
Codes of more MMScan Visual Grounding baselines and Question Answering baselines.
Full release and further updates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

🤖 Demo

📋 Contents

🔥 News

🏠 About

🚀 Getting Started

Installation

Data Preparation

👓 MMScan Tutorial

MMScan Dataset

Usage

Data Access

MMScan Evaluation

MMScan HVG Challenge Submission

🏆 MMScan Benchmark

MMScan Visual Grounding Benchmark

MMScan Question Answering Benchmark

📝 TODO List

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Languages

License

InternRobotics/EmbodiedScan

Folders and files

Latest commit

History

Repository files navigation

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

🤖 Demo

📋 Contents

🔥 News

🏠 About

🚀 Getting Started

Installation

Data Preparation

👓 MMScan Tutorial

MMScan Dataset

Usage

Data Access

MMScan Evaluation

MMScan HVG Challenge Submission

🏆 MMScan Benchmark

MMScan Visual Grounding Benchmark

MMScan Question Answering Benchmark

📝 TODO List

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Languages

Packages