🚀 Benchmark

This document showcases the performance test results of LightX2V across different hardware environments, including detailed comparison data for H200 and RTX 4090 platforms.

🖥️ H200 Environment (~140GB VRAM)

📋 Software Environment Configuration

Component	Version
Python	3.11
PyTorch	2.7.1+cu128
SageAttention	2.2.0
vLLM	0.9.2
sgl-kernel	0.1.8

🎬 480P 5s Video Test

Test Configuration:

Model: Wan2.1-I2V-14B-480P-Lightx2v
Parameters: infer_steps=40, seed=42, enable_cfg=True

📊 Performance Comparison Table

Configuration	Inference Time(s)	GPU Memory(GB)	Speedup	Video Effect
Wan2.1 Official	366	71	1.0x	baseline.mp4
FastVideo	292	26	1.25x	fastvideo.mp4
LightX2V_1	250	53	1.46x	output_lightx2v_wan_i2v_1.mp4
LightX2V_2	216	50	1.70x	output_lightx2v_wan_i2v_2.mp4
LightX2V_3	191	35	1.92x	output_lightx2v_wan_i2v_3.mp4
LightX2V_3-Distill	14	35	🏆 20.85x	distill.mp4
LightX2V_4	107	35	3.41x	output_lightx2v_wan_i2v_4.mp4

🎬 720P 5s Video Test

Test Configuration:

Model: Wan2.1-I2V-14B-720P-Lightx2v
Parameters: infer_steps=40, seed=1234, enable_cfg=True

📊 Performance Comparison Table

Configuration	Inference Time(s)	GPU Memory(GB)	Speedup	Video Effect
Wan2.1 Official	974	81	1.0x	baseline.mp4
FastVideo	914	40	1.07x	fastvideo.mp4
LightX2V_1	807	65	1.21x	output_lightx2v_wan_i2v_720_1.mp4
LightX2V_2	751	57	1.30x	output_lightx2v_wan_i2v_720_2.mp4
LightX2V_3	671	43	1.45x	output_lightx2v_wan_i2v_720_3.mp4
LightX2V_3-Distill	44	43	🏆 22.14x	output_lightx2v_wan_i2v_720_3_distill.mp4
LightX2V_4	344	46	2.83x	output_lightx2v_wan_i2v_720_4.mp4

🖥️ RTX 4090 Environment (~24GB VRAM)

📋 Software Environment Configuration

Component	Version
Python	3.9.16
PyTorch	2.5.1+cu124
SageAttention	2.1.0
vLLM	0.6.6
sgl-kernel	0.0.5
q8-kernels	0.0.0

🎬 480P 5s Video Test

Test Configuration:

Model: Wan2.1-I2V-14B-480P-Lightx2v
Parameters: infer_steps=40, seed=42, enable_cfg=True

📊 Performance Comparison Table

Configuration	Inference Time(s)	GPU Memory(GB)	Speedup	Video Effect
Wan2GP(profile=3)	779	20	1.0x	wan2gp_480p.mp4
LightX2V_5	738	16	1.05x	lightx2v_5_480p.mp4
LightX2V_5-Distill	68	16	11.45x	lightx2v_5_distill_480p.mp4
LightX2V_6	630	12	1.24x	lightx2v_6_480p.mp4
LightX2V_6-Distill	63	12	🏆 12.36x	lightx2v_6_distill_480p.mp4

🎬 720P 5s Video Test

Test Configuration:

Model: Wan2.1-I2V-14B-720P-Lightx2v
Parameters: infer_steps=40, seed=1234, enable_cfg=True

📊 Performance Comparison Table

Configuration	Inference Time(s)	GPU Memory(GB)	Speedup	Video Effect
Wan2GP(profile=3)	--	OOM	--
LightX2V_5	2473	23	--	lightx2v_5_720p.mp4
LightX2V_5-Distill	183	23	--	lightx2v_5_distill_720p.mp4
LightX2V_6	2169	18	--	lightx2v_6_720p.mp4
LightX2V_6-Distill	171	18	--	lightx2v_6_distill_720p.mp4

📖 Configuration Descriptions

🖥️ H200 Environment Configuration Descriptions

Configuration	Technical Features
Wan2.1 Official	Based on Wan2.1 official repository original implementation
FastVideo	Based on FastVideo official repository, using SageAttention2 backend optimization
LightX2V_1	Uses SageAttention2 to replace native attention mechanism, adopts DIT BF16+FP32 (partial sensitive layers) mixed precision computation, improving computational efficiency while maintaining precision
LightX2V_2	Unified BF16 precision computation, further reducing memory usage and computational overhead while maintaining generation quality
LightX2V_3	Introduces FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage
LightX2V_3-Distill	Based on LightX2V_3 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality
LightX2V_4	Based on LightX2V_3 with TeaCache(teacache_thresh=0.2) caching reuse technology, achieving acceleration through intelligent redundant computation skipping

🖥️ RTX 4090 Environment Configuration Descriptions

Configuration	Technical Features
Wan2GP(profile=3)	Implementation based on Wan2GP repository, using MMGP optimization technology. Profile=3 configuration is suitable for RTX 3090/4090 environments with at least 32GB RAM and 24GB VRAM, adapting to limited memory resources by sacrificing VRAM. Uses quantized models: 480P model and 720P model
LightX2V_5	Uses SageAttention2 to replace native attention mechanism, adopts DIT FP8+FP32 (partial sensitive layers) mixed precision computation, enables CPU offload technology, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity
LightX2V_5-Distill	Based on LightX2V_5 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality
LightX2V_6	Based on LightX2V_3 with CPU offload technology enabled, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity
LightX2V_6-Distill	Based on LightX2V_6 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality

📁 Configuration Files Reference

Benchmark-related configuration files and execution scripts are available at:

Type	Link	Description
Configuration Files	configs/bench	Contains JSON files with various optimization configurations
Execution Scripts	scripts/bench	Contains benchmark execution scripts

💡 Tip: It is recommended to choose the appropriate optimization solution based on your hardware configuration to achieve the best performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Benchmark

🖥️ H200 Environment (~140GB VRAM)

📋 Software Environment Configuration

🎬 480P 5s Video Test

📊 Performance Comparison Table

🎬 720P 5s Video Test

📊 Performance Comparison Table

🖥️ RTX 4090 Environment (~24GB VRAM)

📋 Software Environment Configuration

🎬 480P 5s Video Test

📊 Performance Comparison Table

🎬 720P 5s Video Test

📊 Performance Comparison Table

📖 Configuration Descriptions

🖥️ H200 Environment Configuration Descriptions

🖥️ RTX 4090 Environment Configuration Descriptions

📁 Configuration Files Reference

FilesExpand file tree

benchmark_source.md

Latest commit

History

benchmark_source.md

File metadata and controls

🚀 Benchmark

🖥️ H200 Environment (~140GB VRAM)

📋 Software Environment Configuration

🎬 480P 5s Video Test

📊 Performance Comparison Table

🎬 720P 5s Video Test

📊 Performance Comparison Table

🖥️ RTX 4090 Environment (~24GB VRAM)

📋 Software Environment Configuration

🎬 480P 5s Video Test

📊 Performance Comparison Table

🎬 720P 5s Video Test

📊 Performance Comparison Table

📖 Configuration Descriptions

🖥️ H200 Environment Configuration Descriptions

🖥️ RTX 4090 Environment Configuration Descriptions

📁 Configuration Files Reference