Skip to content

PRIS-CV/DDG

Repository files navigation

DDG: Diffusion-Distilled Generalization

Official PyTorch implementation of "Implicit Diffusion Distillation with Domain Disentanglement for Domain Generalization" (ICME 2026).

Authors: Yujun Tong, Dongliang Chang*, Junhan Chen, Yuanchen Fang, and Zhanyu Ma Affiliation: Beijing University of Posts and Telecommunications (BUPT)


Overview

DDG (Diffusion-Distilled Generalization) is a novel framework that leverages frozen diffusion models to address the challenge of domain generalization. Instead of relying on computationally expensive data augmentation, we propose an efficient end-to-end distillation paradigm where the visual backbone is optimized as a conditioning agent to guide a frozen diffusion teacher in reconstructing source images.

Key Features

  • Generative Inverse Problem Paradigm: Distills structure-aware knowledge from frozen Stable Diffusion models into discriminative backbones
  • Domain-Debiasing Guidance: Employs Classifier-Free Guidance (CFG) to explicitly suppress domain-specific style biases
  • State-of-the-Art Performance: Achieves 79.57% average accuracy on DomainBed benchmark with CLIP initialization
  • Two-Stage Training: Stage I aligns features with CLIP text space; Stage II performs generative distillation

Motivation

Motivation

Traditional methods rely on scarce source data, leading to biased alignment where learned representations fail to cover unseen target distributions. Our DDG framework leverages the robust generative prior of a frozen diffusion model to effectively expand the feature support, bridging the gap to unseen domains.

Method Architecture

Method Overview

The framework operates in two stages:

  1. Stage I - Semantic Alignment: Aligns visual features with frozen CLIP text embeddings
  2. Stage II - Generative Distillation: Optimizes the backbone to maximize image likelihood under the diffusion prior while suppressing domain-specific styles

Installation

Requirements

  • Python 3.8+
  • PyTorch 2.9.1+
  • CUDA 12.8+ (for GPU support)

Setup

# Install dependencies
pip install -r requirements.txt

# Install CLIP from source
pip install git+https://github.com/openai/CLIP.git

Pre-trained Models

The code requires the following pre-trained models:

  1. CLIP Model: openai/clip-vit-large-patch14
  2. Stable Diffusion v1.4: CompVis/stable-diffusion-v1-4

You need to manually download these models and update the paths in the code:

Step 1: Download models from Hugging Face and place them in your preferred directory.

Step 2: Update the model paths in domainbed/algorithms/distill_diffusion.py:

# Line 131: CLIP model path
clip_model_id = hparams.get("clip_model_path", "/path/to/clip-vit-large-patch14")

Step 3: Update the Stable Diffusion model path in domainbed/networks/build.py:

# Line 152: Stable Diffusion v1.4 path
model_id = "/path/to/stable-diffusion-v1-4"

Note: The config_tta.yaml file is already included in the repository root directory and does not need to be modified.


Dataset Preparation

We evaluate on five benchmarks from DomainBed:

  • PACS: 4 domains (Photo, Art, Cartoon, Sketch)
  • VLCS: 4 domains (VOC2007, LabelMe, Caltech101, Sun09)
  • TerraIncognita: 4 domains (L100, L38, L43, L46)
  • OfficeHome: 4 domains (Art, Clipart, Product, Real World)
  • DomainNet: 6 domains (Clipart, Infograph, Painting, Quickdraw, Real, Sketch)

Please follow the DomainBed data preparation guide to download and organize the datasets. Update the --data_dir argument in training scripts to point to your dataset directory.


Training

Quick Start

Train on PACS dataset with ViT-B/16 student backbone:

bash scripts/clipvitb_student/run.sh 0 1

Arguments:

  • 0: GPU ID
  • 1: Dataset index (0=OfficeHome, 1=PACS, 2=VLCS, 3=TerraIncognita, 4=DomainNet)

Custom Training

python train_all.py vitb-CLIP \
    --clip_backbone "ViT-L/14" \
    --backbone "clip_vit-b16" \
    --algorithm DistillDiffusion \
    --dataset PACS \
    --data_dir /path/to/datasets \
    --stage1_steps 5000 \
    --swadstart_steps 5000 \
    --steps 8000 \
    --lmd 0.5 \
    --seed 0 \
    --swad True

Key Hyperparameters

  • --stage1_steps: Number of steps for Stage I (semantic alignment), default: 5000
  • --swadstart_steps: Step to start SWAD model averaging, default: 5000
  • --steps: Total training steps, default: 8000
  • --lmd: Weight for alignment loss (λ in paper), default: 0.5
  • --clip_backbone: CLIP text encoder architecture for diffusion model conditioning (must match the text encoder used in Stable Diffusion), options: ViT-B/16, ViT-L/14
  • --backbone: Student visual backbone architecture, options: clip_vit-b16, vit-base, resnet50

Results

Performance on DomainBed (CLIP ViT-B/16 Initialization)

Method OH TerraInc VLCS PACS DomainNet Avg
VL2V-SD 87.38 58.54 83.25 96.68 62.79 77.73
DDG (Paper) 87.90 65.37 84.43 96.83 63.31 79.57
DDG (A800) 87.74 65.44 84.04 96.97 63.63 79.56

Note: The results in the "DDG (A800)" row are obtained by re-running experiments on A800 GPUs and may differ slightly from the paper due to hardware variations. Our reproduced results consistently match or exceed the paper's reported performance.


Trained Model Checkpoints

Trained model checkpoints will be available soon.

Download links: Coming soon


Citation

If you find this work helpful, please consider citing:

@inproceedings{tong2026ddg,
  title={Implicit Diffusion Distillation with Domain Disentanglement for Domain Generalization},
  author={Tong, Yujun and Chang, Dongliang and Chen, Junhan and Fang, Yuanchen and Ma, Zhanyu},
  booktitle={Proceedings of the IEEE International Conference on Multimedia \& Expo (ICME)},
  year={2026}
}

Acknowledgements

This codebase is built upon the following excellent projects:

We sincerely thank the authors for their open-source contributions.


License

This project is licensed under the MIT License - see the LICENSE file for details.


Contact

For questions or issues, please open an issue on GitHub or contact:

About

Official PyTorch implementation of **"Implicit Diffusion Distillation with Domain Disentanglement for Domain Generalization"** (ICME 2026).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors