DDG: Diffusion-Distilled Generalization

Official PyTorch implementation of "Implicit Diffusion Distillation with Domain Disentanglement for Domain Generalization" (ICME 2026).

Authors: Yujun Tong, Dongliang Chang*, Junhan Chen, Yuanchen Fang, and Zhanyu Ma Affiliation: Beijing University of Posts and Telecommunications (BUPT)

Overview

DDG (Diffusion-Distilled Generalization) is a novel framework that leverages frozen diffusion models to address the challenge of domain generalization. Instead of relying on computationally expensive data augmentation, we propose an efficient end-to-end distillation paradigm where the visual backbone is optimized as a conditioning agent to guide a frozen diffusion teacher in reconstructing source images.

Key Features

Generative Inverse Problem Paradigm: Distills structure-aware knowledge from frozen Stable Diffusion models into discriminative backbones
Domain-Debiasing Guidance: Employs Classifier-Free Guidance (CFG) to explicitly suppress domain-specific style biases
State-of-the-Art Performance: Achieves 79.57% average accuracy on DomainBed benchmark with CLIP initialization
Two-Stage Training: Stage I aligns features with CLIP text space; Stage II performs generative distillation

Motivation

Traditional methods rely on scarce source data, leading to biased alignment where learned representations fail to cover unseen target distributions. Our DDG framework leverages the robust generative prior of a frozen diffusion model to effectively expand the feature support, bridging the gap to unseen domains.

Method Architecture

The framework operates in two stages:

Stage I - Semantic Alignment: Aligns visual features with frozen CLIP text embeddings
Stage II - Generative Distillation: Optimizes the backbone to maximize image likelihood under the diffusion prior while suppressing domain-specific styles

Installation

Requirements

Python 3.8+
PyTorch 2.9.1+
CUDA 12.8+ (for GPU support)

Setup

# Install dependencies
pip install -r requirements.txt

# Install CLIP from source
pip install git+https://github.com/openai/CLIP.git

Pre-trained Models

The code requires the following pre-trained models:

CLIP Model: openai/clip-vit-large-patch14
Stable Diffusion v1.4: CompVis/stable-diffusion-v1-4

You need to manually download these models and update the paths in the code:

Step 1: Download models from Hugging Face and place them in your preferred directory.

Step 2: Update the model paths in domainbed/algorithms/distill_diffusion.py:

# Line 131: CLIP model path
clip_model_id = hparams.get("clip_model_path", "/path/to/clip-vit-large-patch14")

Step 3: Update the Stable Diffusion model path in domainbed/networks/build.py:

# Line 152: Stable Diffusion v1.4 path
model_id = "/path/to/stable-diffusion-v1-4"

Note: The config_tta.yaml file is already included in the repository root directory and does not need to be modified.

Dataset Preparation

We evaluate on five benchmarks from DomainBed:

PACS: 4 domains (Photo, Art, Cartoon, Sketch)
VLCS: 4 domains (VOC2007, LabelMe, Caltech101, Sun09)
TerraIncognita: 4 domains (L100, L38, L43, L46)
OfficeHome: 4 domains (Art, Clipart, Product, Real World)
DomainNet: 6 domains (Clipart, Infograph, Painting, Quickdraw, Real, Sketch)

Please follow the DomainBed data preparation guide to download and organize the datasets. Update the --data_dir argument in training scripts to point to your dataset directory.

Training

Quick Start

Train on PACS dataset with ViT-B/16 student backbone:

bash scripts/clipvitb_student/run.sh 0 1

Arguments:

0: GPU ID
1: Dataset index (0=OfficeHome, 1=PACS, 2=VLCS, 3=TerraIncognita, 4=DomainNet)

Custom Training

python train_all.py vitb-CLIP \
    --clip_backbone "ViT-L/14" \
    --backbone "clip_vit-b16" \
    --algorithm DistillDiffusion \
    --dataset PACS \
    --data_dir /path/to/datasets \
    --stage1_steps 5000 \
    --swadstart_steps 5000 \
    --steps 8000 \
    --lmd 0.5 \
    --seed 0 \
    --swad True

Key Hyperparameters

--stage1_steps: Number of steps for Stage I (semantic alignment), default: 5000
--swadstart_steps: Step to start SWAD model averaging, default: 5000
--steps: Total training steps, default: 8000
--lmd: Weight for alignment loss (λ in paper), default: 0.5
--clip_backbone: CLIP text encoder architecture for diffusion model conditioning (must match the text encoder used in Stable Diffusion), options: ViT-B/16, ViT-L/14
--backbone: Student visual backbone architecture, options: clip_vit-b16, vit-base, resnet50

Results

Performance on DomainBed (CLIP ViT-B/16 Initialization)

Method	OH	TerraInc	VLCS	PACS	DomainNet	Avg
VL2V-SD	87.38	58.54	83.25	96.68	62.79	77.73
DDG (Paper)	87.90	65.37	84.43	96.83	63.31	79.57
DDG (A800)	87.74	65.44	84.04	96.97	63.63	79.56

Note: The results in the "DDG (A800)" row are obtained by re-running experiments on A800 GPUs and may differ slightly from the paper due to hardware variations. Our reproduced results consistently match or exceed the paper's reported performance.

Trained Model Checkpoints

Trained model checkpoints will be available soon.

Download links: Coming soon

Citation

If you find this work helpful, please consider citing:

@inproceedings{tong2026ddg,
  title={Implicit Diffusion Distillation with Domain Disentanglement for Domain Generalization},
  author={Tong, Yujun and Chang, Dongliang and Chen, Junhan and Fang, Yuanchen and Ma, Zhanyu},
  booktitle={Proceedings of the IEEE International Conference on Multimedia \& Expo (ICME)},
  year={2026}
}

Acknowledgements

This codebase is built upon the following excellent projects:

We sincerely thank the authors for their open-source contributions.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions or issues, please open an issue on GitHub or contact:

Yujun Tong: tongyujun@bupt.edu.cn
Dongliang Chang: changdongliang@bupt.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
domainbed		domainbed
figures		figures
scripts/clipvitb_student		scripts/clipvitb_student
train_output		train_output
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
config_tta.yaml		config_tta.yaml
requirements.txt		requirements.txt
train_all.py		train_all.py
train_clip.sh		train_clip.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDG: Diffusion-Distilled Generalization

Overview

Key Features

Motivation

Method Architecture

Installation

Requirements

Setup

Pre-trained Models

Dataset Preparation

Training

Quick Start

Custom Training

Key Hyperparameters

Results

Performance on DomainBed (CLIP ViT-B/16 Initialization)

Trained Model Checkpoints

Citation

Acknowledgements

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DDG: Diffusion-Distilled Generalization

Overview

Key Features

Motivation

Method Architecture

Installation

Requirements

Setup

Pre-trained Models

Dataset Preparation

Training

Quick Start

Custom Training

Key Hyperparameters

Results

Performance on DomainBed (CLIP ViT-B/16 Initialization)

Trained Model Checkpoints

Citation

Acknowledgements

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages