Skip to content

PiggyJerry/MammoVQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Clone repository

git clone https://github.com/PiggyJerry/MammoVQA.git
cd MammoVQA
conda create -n mammovqa python==3.9
conda activate mammovqa

python -m pip install -r requirements.txt

Prepare MammoVQA dataset

Sub-Dataset-Links:

Sub-Datasets downloading URL:

Dataset Name Dataset Link Paper Link Access
BMCD Link Digital subtraction of temporally sequential mammograms for improved detection and classification of microcalcifications Open Access
CBIS-DDSM Link A curated mammography data set for use in computer-aided detection and diagnosis research Open Access
CDD-CESM Link Categorized contrast enhanced mammography dataset for diagnostic and artificial intelligence research Open Access
DMID Link Digital mammography dataset for breast cancer diagnosis research (dmid) with breast mass segmentation analysis Open Access
INbreast Link Inbreast: toward a full-field digital mammographic database Open Access
MIAS Link The mammographic images analysis society digital mammogram database Open Access
CSAW-M Link Csaw-m: An ordinal classification dataset for benchmarking mammographic masking of cancer Credentialed Access
KAU-BCMD Link King abdulaziz university breast cancer mammogram dataset (kau-bcmd) Open Access
VinDr-Mammo Link Vindr-mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography Credentialed Access
RSNA Link RSNA: Radiological Society of North America. Rsna screening mammography breast cancer detection ai challenge Open Access
EMBED Link The emory breast imaging dataset (embed): A racially diverse, granular dataset of 3.4 million screening and diagnostic mammographic images Credentialed Access
DBT-Test Link Detection of masses and architectural distortions in digital breast tomosynthesis: a publicly available dataset of 5,060 patients and a deep learning model Open Access
LAMIS Link Lamis-dmdb: A new full field digital mammography database for breast cancer ai-cad researches Credentialed Access
MM Link Mammogram mastery: a robust dataset for breast cancer detection and medical education Open Access
NLBS Link Full field digital mammography dataset from a population screening program Open Access

The json file of MammoVQA can be found in Google Drive, after downloading it, unzip the file and put under /Benchmark/.

Processing Dataset Codes and Files Linking:

Dataset Name Process Dataset Code
BMCD https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/BMCD.ipynb
CBIS-DDSM https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/CBIS-DDSM.ipynb
CDD-CESM https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/CDD-CESM.ipynb
DMID https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/DMID.ipynb
INbreast https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/INbreast.ipynb
MIAS https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/MIAS.ipynb
CSAW-M https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/CSAW-M.ipynb
KAU-BCMD https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/KAU-BCMD.ipynb
VinDr-Mammo https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/VinDr-Mammo.ipynb
RSNA https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/rsna.ipynb
EMBED https://github.com/PiggyJerry/MammoVQA/blob/main/preprocess/EMBED.ipynb

After downloaded sub-datasets above, you have to use the correspoding processing code for it. Remember to change the dataset link in the code!!!

Prepare compared LVLMs

If you only want to evaluate your model on MammoVQA, you can skip it.

Please follow the repositories of compared LVLMs (BLIP-2\InstructBLIP,LLaVA-Med,LLaVA-NeXT-interleave,Med-Flamingo,MedDr,MedVInT_TD,minigpt-4,RadFM to prepare the weights and environments.

❗All the LLM weights should be put under MammoVQA/LLM/, except the weight of MedVInT_TD should be put under MammoVQA/Sota/MedVInT_TD/results/ and the weight of RadFM should be put under MammoVQA/Sota/RadFM-main/Quick_demo/.

Finetune DiNOv2

You can finetune DiNOv2 on our MammoVQA training dataset and evaluate it on our test dataset.

  • S1. Download the DiNOv2 pre-trained weight here[https://github.com/facebookresearch/dinov2], and modify the DiNOv2 weight's path in the 67 line of /MammoVQA/finetune/DiNOv2/models/image_encoder.py.
  • S2. python /MammoVQA/finetune/DiNOv2/main.py to train the model.
  • S3. python /MammoVQA/finetune/DiNOv2/eval.py to evaluate the model, and you can get the result file in /MammoVQA/Result/DiNOv2.json.
  • S4. python /MammoVQA/Eval/Visiononly.py to calculate metrics.

Quick Start:

For quick start, you can check the Quick_demo path. We demonstrate a simple diagnosis case here to show how to inference on MammoVQA with our LLaVA-Mammo.
Feel free to modify it as you want.

  • S1. Download Model checkpoint of LLaVA-Mammo, and unzip it to Quick_demo path.
  • S2. python /MammoVQA/Quick_demo/eval.py to inference, and you can get the result file in /MammoVQA/Result/LLaVA-Mammo.json.
  • S3. python /MammoVQA/Eval/LLM.py to calculate metrics.

Citation

@article{zhu2025benchmark,
  title={A Benchmark for Breast Cancer Screening and Diagnosis in Mammogram Visual Question Answering},
  author={Zhu, Jiayi and Huang, Fuxiang and Luo, Qiong and Chen, Hao},
  journal={Nature Communications},
  year={2025},
  publisher={Nature Publishing Group UK London}
}

About

The repository of our accepted paper in Nature Communications 2025: "A Benchmark for Breast Cancer Screening and Diagnosis in Mammogram Visual Question Answering"

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors