A Benchmark for Breast Cancer Screening and Diagnosis in Mammogram Visual Question Answering

Clone repository

git clone https://github.com/PiggyJerry/MammoVQA.git
cd MammoVQA
conda create -n mammovqa python==3.9
conda activate mammovqa

python -m pip install -r requirements.txt

Prepare MammoVQA dataset

Sub-Dataset-Links:

Sub-Datasets downloading URL:

Dataset Name	Dataset Link	Paper Link	Access
BMCD	Link	Digital subtraction of temporally sequential mammograms for improved detection and classification of microcalcifications	Open Access
CBIS-DDSM	Link	A curated mammography data set for use in computer-aided detection and diagnosis research	Open Access
CDD-CESM	Link	Categorized contrast enhanced mammography dataset for diagnostic and artificial intelligence research	Open Access
DMID	Link	Digital mammography dataset for breast cancer diagnosis research (dmid) with breast mass segmentation analysis	Open Access
INbreast	Link	Inbreast: toward a full-field digital mammographic database	Open Access
MIAS	Link	The mammographic images analysis society digital mammogram database	Open Access
CSAW-M	Link	Csaw-m: An ordinal classification dataset for benchmarking mammographic masking of cancer	Credentialed Access
KAU-BCMD	Link	King abdulaziz university breast cancer mammogram dataset (kau-bcmd)	Open Access
VinDr-Mammo	Link	Vindr-mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography	Credentialed Access
RSNA	Link	RSNA: Radiological Society of North America. Rsna screening mammography breast cancer detection ai challenge	Open Access
EMBED	Link	The emory breast imaging dataset (embed): A racially diverse, granular dataset of 3.4 million screening and diagnostic mammographic images	Credentialed Access
DBT-Test	Link	Detection of masses and architectural distortions in digital breast tomosynthesis: a publicly available dataset of 5,060 patients and a deep learning model	Open Access
LAMIS	Link	Lamis-dmdb: A new full field digital mammography database for breast cancer ai-cad researches	Credentialed Access
MM	Link	Mammogram mastery: a robust dataset for breast cancer detection and medical education	Open Access
NLBS	Link	Full field digital mammography dataset from a population screening program	Open Access

The json file of MammoVQA can be found in Google Drive, after downloading it, unzip the file and put under /Benchmark/.

Processing Dataset Codes and Files Linking:

Dataset Name	Process Dataset Code
BMCD	https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/BMCD.ipynb
CBIS-DDSM	https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/CBIS-DDSM.ipynb
CDD-CESM	https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/CDD-CESM.ipynb
DMID	https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/DMID.ipynb
INbreast	https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/INbreast.ipynb
MIAS	https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/MIAS.ipynb
CSAW-M	https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/CSAW-M.ipynb
KAU-BCMD	https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/KAU-BCMD.ipynb
VinDr-Mammo	https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/VinDr-Mammo.ipynb
RSNA	https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/rsna.ipynb
EMBED	https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/EMBED.ipynb

After downloaded sub-datasets above, you have to use the correspoding processing code for it. Remember to change the dataset link in the code!!!

Prepare compared LVLMs

If you only want to evaluate your model on MammoVQA, you can skip it.

Please follow the repositories of compared LVLMs (BLIP-2\InstructBLIP,LLaVA-Med,LLaVA-NeXT-interleave,Med-Flamingo,MedDr,MedVInT_TD,minigpt-4,RadFM to prepare the weights and environments.

❗All the LLM weights should be put under MammoVQA/LLM/, except the weight of MedVInT_TD should be put under MammoVQA/Sota/MedVInT_TD/results/ and the weight of RadFM should be put under MammoVQA/Sota/RadFM-main/Quick_demo/.

Finetune DiNOv2

You can finetune DiNOv2 on our MammoVQA training dataset and evaluate it on our test dataset.

S1. Download the DiNOv2 pre-trained weight here[https://github.com/facebookresearch/dinov2], and modify the DiNOv2 weight's path in the 67 line of /MammoVQA/finetune/DiNOv2/models/image_encoder.py.
S2. python /MammoVQA/finetune/DiNOv2/main.py to train the model.
S3. python /MammoVQA/finetune/DiNOv2/eval.py to evaluate the model, and you can get the result file in /MammoVQA/Result/DiNOv2.json.
S4. python /MammoVQA/Eval/Visiononly.py to calculate metrics.

Quick Start:

For quick start, you can check the Quick_demo path. We demonstrate a simple diagnosis case here to show how to inference on MammoVQA with our LLaVA-Mammo.
Feel free to modify it as you want.

S1. Download Model checkpoint of LLaVA-Mammo, and unzip it to Quick_demo path.
S2. python /MammoVQA/Quick_demo/eval.py to inference, and you can get the result file in /MammoVQA/Result/LLaVA-Mammo.json.
S3. python /MammoVQA/Eval/LLM.py to calculate metrics.

Citation

@article{zhu2025benchmark,
  title={A Benchmark for Breast Cancer Screening and Diagnosis in Mammogram Visual Question Answering},
  author={Zhu, Jiayi and Huang, Fuxiang and Luo, Qiong and Chen, Hao},
  journal={Nature Communications},
  year={2025},
  publisher={Nature Publishing Group UK London}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Benchmark for Breast Cancer Screening and Diagnosis in Mammogram Visual Question Answering

Clone repository

Prepare MammoVQA dataset

Sub-Dataset-Links:

Processing Dataset Codes and Files Linking:

Prepare compared LVLMs

Finetune DiNOv2

Quick Start:

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
Benchmark		Benchmark
Eval		Eval
LLM		LLM
Quick_demo		Quick_demo
Result		Result
Sota		Sota
finetune/DiNOv2		finetune/DiNOv2
main		main
preprocess		preprocess
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

A Benchmark for Breast Cancer Screening and Diagnosis in Mammogram Visual Question Answering

Clone repository

Prepare MammoVQA dataset

Sub-Dataset-Links:

Processing Dataset Codes and Files Linking:

Prepare compared LVLMs

Finetune DiNOv2

Quick Start:

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages