Skip to content

kaist-ami/Patchwise-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching

WACV 2026

Project Page arXiv Video GitHub

Patchify is a training-free framework for instance-level image retrieval. Gallery images are decomposed into a spatial pyramid of patches, and query embeddings are matched against local patch features from a pretrained vision encoder (SigLIP, DINOv2, CLIP). The approach achieves high retrieval and localization performance without any fine-tuning.


Contents


Installation

We recommend Python 3.10 with CUDA 11.8.

# 1. Clone the repository
git clone https://github.com/kaist-ami/Patchwise-Retrieval.git
cd Patchwise-Retrieval

# 2. Create and activate a conda environment
conda create -n ssr python=3.10 -y
conda activate ssr

# 3. Install dependencies
pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

requirements.txt key packages:

timm>=0.9.0         # pretrained vision models (SigLIP, DINOv2, CLIP)
faiss-gpu>=1.7.0    # approximate nearest-neighbour search
numpy, Pillow, tqdm, omegaconf, hydra-core, loguru, matplotlib, pyyaml

Dataset Preparation

We evaluate on ILIAS (Instance-Level Image retrieval At Scale). Download the ILIAS core set from the official ILIAS website and place it under ilias/ilias_core/.

ilias/
├── ilias_core/          # images (4,885 gallery + 1,232 query)  ← download here
├── query.json           # query annotations (included)
└── gallery.json         # gallery annotations (included)

query.json and gallery.json are already included in this repository with relative paths pre-configured.


Quick Start

All scripts are in the scripts/ directory. On the first run, patch features are automatically extracted and cached under ./features/. Trained FAISS indices are cached under ./faiss_indices/. Logs are written to ./logs/.

Exact Search (no PQ)

bash scripts/run_eval.sh

Compressed Search (IVF-PQ)

bash scripts/run_eval_pq.sh

Reproducing Paper Results

Exact Search (no PQ)

python main.py \
    --config src/eval_retrieval.py \
    --model_name siglip \
    --multi-scale pyramid --crop True \
    --pq False \
    --tile_size_res 3 \
    --compute_metrics "mAP, locscore" --iou_threshold default

Change --tile_size_res to evaluate at different patchify levels (0 = global image, 3 = 4×4 patch grid).

Model Level mAP LocScore
SigLIP L3 (4×4) 64.37% 19.24%

Compressed Search (IVF-PQ)

IVF-PQ compresses the gallery index by ~13× with a small accuracy trade-off.

python main.py \
    --config src/eval_retrieval.py \
    --model_name siglip \
    --multi-scale pyramid --crop True \
    --pq ivfpq \
    --pq_m 64 --pq_nlist 8192 --pq_nbits 8 --pq_nprobe 8192 \
    --pq_dist_type IP --pq_train_res 1 \
    --tile_size_res 3 \
    --compute_metrics "mAP, locscore" --iou_threshold default
Method mAP LocScore
Exact search 64.37% 19.24%
IVF-PQ 59.30% 17.44%

Paper reports 59.96% mAP for L3 with PQ. The 0.66 pp gap is consistent with the small difference in the base no-PQ result (our 64.37% vs. paper's 65.16%).

Key insight — why --pq_train_res 1: Training the IVF quantizer on coarser L1 features (rather than L3 self-training) produces more globally representative centroids, which yields smaller residuals at L3 search time and better PQ quantization accuracy.


CLI Reference

python main.py [OPTIONS]
Argument Default Description
--config src/eval_retrieval.py Evaluator config file
--model_name siglip Backbone: siglip, dinov2, clip
--tile_size_res 3 Patchify level: 0=L0 (global) … 3=L3 (4×4 grid)
--multi-scale pyramid Multi-scale strategy (pyramid)
--crop True Use crop-based patch extraction
--image_size 0 Override model's native input resolution (0 = native)
--split test Dataset split
--compute_metrics "mAP, locscore" Metrics: mAP, top-k, locscore
--iou_threshold average IoU mode: default (0.4) or average (0.2–0.6)
--visualize False Save top-k retrieval visualizations
--batch_size 16 Batch size for gallery encoding
PQ options
--pq False PQ mode: False, pq, or ivfpq
--pq_m 16 Number of PQ sub-quantizers (must divide feature dim)
--pq_nlist 256 Number of IVF clusters
--pq_nprobe 256 Clusters to probe at search time (set = nlist for exhaustive)
--pq_nbits 8 Bits per sub-code (typically 8)
--pq_dist_type IP Distance metric: IP (inner product) or L2
--pq_train_res -1 Resolution level for quantizer training: -1 = same as tile_size_res, -2 = all levels combined

Project Structure

Patchwise-Retrieval/
├── main.py                    # Evaluation entry point
├── requirements.txt
│
├── ilias/                     # ILIAS benchmark data
│   ├── ilias_core/            # Images (download separately)
│   ├── query.json             # Query annotations (bounding boxes, image IDs)
│   └── gallery.json           # Gallery annotations
│
├── src/
│   ├── config.py              # Model loading (timm), LazyConfig
│   ├── eval_retrieval.py      # Evaluator config (dataset path, metrics)
│   ├── dataset.py             # Ilias dataset class + DatasetCatalog
│   ├── pyramid_embedding.py   # Multi-scale patch extraction
│   ├── retrieval.py           # RegionalImageRetrievalEvaluator, FAISS index
│   ├── metrics.py             # mAP, LocScore computation
│   ├── util.py                # Helpers (str2bool, set_seed, save/load)
│   └── visualization.py       # Top-k result rendering
│
├── scripts/
│   ├── run_eval.sh            # SigLIP L3 exact search
│   ├── run_eval_pq.sh         # SigLIP L3 IVF-PQ
│   └── run_table1.sh          # DINOv2 / CLIP reproduction
│
├── features/                  # Auto-generated patch feature cache
├── faiss_indices/             # Auto-generated trained FAISS indices
└── logs/                      # Evaluation logs

Citation

@inproceedings{choi2026PatchwiseRetrieval,
  title={Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching},
  author={Wonseok Choi and Sohwi Lim and Nam Hyeon-Woo and Moon Ye-Bin and Dong-Ju Jeong and Jinyoung Hwang and Tae-Hyun Oh},
  booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
  year={2026}
}

Reference: ILIAS benchmark — https://github.com/ilias-vrg/ilias

About

[WACV'26] Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors