You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document showcases the performance test results of LightX2V across different hardware environments, including detailed comparison data for H200 and RTX 4090 platforms.
Uses SageAttention2 to replace native attention mechanism, adopts DIT BF16+FP32 (partial sensitive layers) mixed precision computation, improving computational efficiency while maintaining precision
LightX2V_2
Unified BF16 precision computation, further reducing memory usage and computational overhead while maintaining generation quality
LightX2V_3
Introduces FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage
LightX2V_3-Distill
Based on LightX2V_3 using 4-step distillation model(infer_steps=4, enable_cfg=False), further reducing inference steps while maintaining generation quality
LightX2V_4
Based on LightX2V_3 with TeaCache(teacache_thresh=0.2) caching reuse technology, achieving acceleration through intelligent redundant computation skipping
Implementation based on Wan2GP repository, using MMGP optimization technology. Profile=3 configuration is suitable for RTX 3090/4090 environments with at least 32GB RAM and 24GB VRAM, adapting to limited memory resources by sacrificing VRAM. Uses quantized models: 480P model and 720P model
LightX2V_5
Uses SageAttention2 to replace native attention mechanism, adopts DIT FP8+FP32 (partial sensitive layers) mixed precision computation, enables CPU offload technology, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity
LightX2V_5-Distill
Based on LightX2V_5 using 4-step distillation model(infer_steps=4, enable_cfg=False), further reducing inference steps while maintaining generation quality
LightX2V_6
Based on LightX2V_3 with CPU offload technology enabled, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity
LightX2V_6-Distill
Based on LightX2V_6 using 4-step distillation model(infer_steps=4, enable_cfg=False), further reducing inference steps while maintaining generation quality
π Configuration Files Reference
Benchmark-related configuration files and execution scripts are available at: