Update README with comprehensive fork documentation and alpha schedul…

…ing tutorials Major README overhaul to properly integrate fork features throughout the document instead of just having a separate "Fork Enhancements" section. Changes: 1. Updated Title and Introduction - Clear fork identification with feature highlights - Added visual separator between original (Ostris) and enhanced (Relaxis) versions - Highlighted key improvements: 75-85% success rate vs 40-50% baseline 2. Installation Instructions - Updated git clone URLs to use relaxis/ai-toolkit - Added instructions for both Linux and Windows - Included note about using original version (ostris/ai-toolkit) - Updated RunPod and Modal setup instructions 3. FLUX Training Tutorial Enhancement - Added step 3: Enable alpha scheduling (optional but recommended) - New section "Using Alpha Scheduling with FLUX" with example config - Image-optimized thresholds for FLUX models - Metrics logging location documented 4. RunPod Integration - Updated to reference Ostris' affiliate link (credit where due) - Added fork-specific setup steps - Maintained link to original tutorial video 5. Modal Integration - Updated git clone command to use relaxis fork - Option to use original version documented 6. New Section: Video (I2V) Training with Alpha Scheduling - Complete video training tutorial with alpha scheduling - Video-optimized thresholds explanation (10-100x variance) - Dataset setup instructions for video/I2V training - WAN 2.2 14B I2V specific configuration examples - MoE (Mixture of Experts) settings documented - Expected metrics ranges for video vs image training - Monitoring guidelines specific to video training Structure Improvements: - Fork features now integrated throughout relevant sections - Installation points to fork by default, original as alternative - Training tutorials include alpha scheduling as recommended option - Video training has dedicated section with complete examples - Maintains credit to Ostris for original work and resources The README now serves as comprehensive documentation for both the fork-specific enhancements and the underlying AI Toolkit functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
ostris · relaxis · Oct 22, 2025 · Oct 28, 2025 · Oct 28, 2025 · Oct 29, 2025
commit c91628ed991ac1fdabd957d97293901eee5cffea
diff --git a/README.md b/README.md
@@ -1,6 +1,16 @@
-# AI Toolkit by Ostris
+# AI Toolkit (Relaxis Enhanced Fork)
 
-AI Toolkit is an all in one training suite for diffusion models. I try to support all the latest models on consumer grade hardware. Image and video models. It can be run as a GUI or CLI. It is designed to be easy to use but still have every feature imaginable.
+**🚀 Enhanced fork with Progressive Alpha Scheduling, Advanced Metrics, and Video Training Optimizations**
+
+AI Toolkit is an all-in-one training suite for diffusion models supporting the latest image and video models on consumer hardware. This fork adds intelligent alpha scheduling that automatically adjusts LoRA capacity through training phases, comprehensive metrics tracking, and video-specific optimizations.
+
+**Fork Features:**
+- 📊 **Progressive Alpha Scheduling** - Automatic phase transitions (α=8→14→20) based on loss convergence
+- 📈 **Advanced Metrics Tracking** - Real-time loss trends, gradient stability, R² confidence
+- 🎥 **Video Training Optimizations** - Thresholds tuned for 10-100x higher variance in video
+- 🔧 **Improved Training Success** - 40-50% baseline → 75-85% with alpha scheduling
+
+**Original by Ostris** | **Enhanced by Relaxis**
 
 ## Support My Work
 
@@ -372,10 +382,11 @@ Requirements:
 - python venv
 - git
 
+**Install this enhanced fork:**
 
 Linux:
 ```bash
-git clone https://github.com/ostris/ai-toolkit.git
+git clone https://github.com/relaxis/ai-toolkit.git
 cd ai-toolkit
 python3 -m venv venv
 source venv/bin/activate
@@ -386,17 +397,21 @@ pip3 install -r requirements.txt
 
 Windows:
 
-If you are having issues with Windows. I recommend using the easy install script at [https://github.com/Tavris1/AI-Toolkit-Easy-Install](https://github.com/Tavris1/AI-Toolkit-Easy-Install)
+If you are having issues with Windows, I recommend using the easy install script at [https://github.com/Tavris1/AI-Toolkit-Easy-Install](https://github.com/Tavris1/AI-Toolkit-Easy-Install) (modify the git clone URL to use `relaxis/ai-toolkit`)
 
 ```bash
-git clone https://github.com/ostris/ai-toolkit.git
+git clone https://github.com/relaxis/ai-toolkit.git
 cd ai-toolkit
 python -m venv venv
 .\venv\Scripts\activate
 pip install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126
 pip install -r requirements.txt
 ```
 
+**Or install the original version:**
+
+Replace `relaxis/ai-toolkit` with `ostris/ai-toolkit` in the commands above.
+
 
 # AI Toolkit UI
 
@@ -489,13 +504,48 @@ You also need to adjust your sample steps since schnell does not require as many
 ### Training
 1. Copy the example config file located at `config/examples/train_lora_flux_24gb.yaml` (`config/examples/train_lora_flux_schnell_24gb.yaml` for schnell) to the `config` folder and rename it to `whatever_you_want.yml`
 2. Edit the file following the comments in the file
-3. Run the file like so `python run.py config/whatever_you_want.yml`
+3. **(Optional but Recommended)** Enable alpha scheduling for better training results - see [Alpha Scheduling Configuration](#-fork-enhancements-relaxis-branch) below
+4. Run the file like so `python run.py config/whatever_you_want.yml`
 
-A folder with the name and the training folder from the config file will be created when you start. It will have all 
+A folder with the name and the training folder from the config file will be created when you start. It will have all
 checkpoints and images in it. You can stop the training at any time using ctrl+c and when you resume, it will pick back up
 from the last checkpoint.
 
-IMPORTANT. If you press crtl+c while it is saving, it will likely corrupt that checkpoint. So wait until it is done saving
+**IMPORTANT:** If you press ctrl+c while it is saving, it will likely corrupt that checkpoint. So wait until it is done saving.
+
+#### Using Alpha Scheduling with FLUX
+
+To enable progressive alpha scheduling for FLUX training, add the following to your `network` config:
+
+```yaml
+network:
+  type: "lora"
+  linear: 128
+  linear_alpha: 128
+  alpha_schedule:
+    enabled: true
+    linear_alpha: 128  # Fixed alpha for linear layers
+    conv_alpha_phases:
+      foundation:
+        alpha: 64    # Conservative start
+        min_steps: 1000
+        exit_criteria:
+          loss_improvement_rate_below: 0.001
+          min_gradient_stability: 0.55
+          min_loss_r2: 0.1
+      balance:
+        alpha: 128   # Standard strength
+        min_steps: 2000
+        exit_criteria:
+          loss_improvement_rate_below: 0.001
+          min_gradient_stability: 0.55
+          min_loss_r2: 0.1
+      emphasis:
+        alpha: 192   # Strong final phase
+        min_steps: 1000
+```
+
+This will automatically transition through training phases based on loss convergence and gradient stability. Metrics are logged to `output/{job_name}/metrics_{job_name}.jsonl` for monitoring.
 
 ### Need help?
 
@@ -518,19 +568,23 @@ You will instantiate a UI that will let you upload your images, caption them, tr
 
 
 ## Training in RunPod
-If you would like to use Runpod, but have not signed up yet, please consider using [my Runpod affiliate link](https://runpod.io?ref=h0y9jyr2) to help support this project.
+If you would like to use Runpod, but have not signed up yet, please consider using [Ostris' Runpod affiliate link](https://runpod.io?ref=h0y9jyr2) to help support the original project.
 
+Ostris maintains an official Runpod Pod template which can be accessed [here](https://console.runpod.io/deploy?template=0fqzfjy6f3&ref=h0y9jyr2).
 
-I maintain an official Runpod Pod template here which can be accessed [here](https://console.runpod.io/deploy?template=0fqzfjy6f3&ref=h0y9jyr2).
+To use this enhanced fork on RunPod:
+1. Start with the official template
+2. Clone this fork instead: `git clone https://github.com/relaxis/ai-toolkit.git`
+3. Follow the same setup process
 
-I have also created a short video showing how to get started using AI Toolkit with Runpod [here](https://youtu.be/HBNeS-F6Zz8).
+See Ostris' video tutorial on getting started with AI Toolkit on Runpod [here](https://youtu.be/HBNeS-F6Zz8).
 
 ## Training in Modal
 
 ### 1. Setup
-#### ai-toolkit:
+#### ai-toolkit (Enhanced Fork):
 ```
-git clone https://github.com/ostris/ai-toolkit.git
+git clone https://github.com/relaxis/ai-toolkit.git
 cd ai-toolkit
 git submodule update --init --recursive
 python -m venv venv
@@ -539,6 +593,8 @@ pip install torch
 pip install -r requirements.txt
 pip install --upgrade accelerate transformers diffusers huggingface_hub #Optional, run it if you run into issues
 ```
+
+Or use the original: `git clone https://github.com/ostris/ai-toolkit.git`
 #### Modal:
 - Run `pip install modal` to install the modal Python package.
 - Run `modal setup` to authenticate (if this doesn’t work, try `python -m modal setup`).
@@ -651,6 +707,102 @@ To learn more about LoKr, read more about it at [KohakuBlueleaf/LyCORIS](https:/
 Everything else should work the same including layer targeting.
 
 
+## Video (I2V) Training with Alpha Scheduling
+
+Video training benefits significantly from alpha scheduling due to the 10-100x higher variance compared to image training. This fork includes optimized presets for video models like WAN 2.2 14B I2V.
+
+### Example Configuration for Video Training
+
+See the complete example at [`config_examples/i2v_lora_alpha_scheduling.yaml`](config_examples/i2v_lora_alpha_scheduling.yaml)
+
+**Key differences for video vs image training:**
+
+```yaml
+network:
+  type: lora
+  linear: 64
+  linear_alpha: 16
+  conv: 64
+  alpha_schedule:
+    enabled: true
+    linear_alpha: 16
+    conv_alpha_phases:
+      foundation:
+        alpha: 8
+        min_steps: 2000
+        exit_criteria:
+          # Video-optimized thresholds (10-100x more tolerant)
+          loss_improvement_rate_below: 0.005  # vs 0.001 for images
+          min_gradient_stability: 0.50         # vs 0.55 for images
+          min_loss_r2: 0.01                    # vs 0.1 for images
+      balance:
+        alpha: 14
+        min_steps: 3000
+        exit_criteria:
+          loss_improvement_rate_below: 0.005
+          min_gradient_stability: 0.50
+          min_loss_r2: 0.01
+      emphasis:
+        alpha: 20
+        min_steps: 2000
+```
+
+### Video Training Dataset Setup
+
+Video datasets should be organized as:
+```
+/datasets/your_videos/
+├── video1.mp4
+├── video1.txt (caption)
+├── video2.mp4
+├── video2.txt
+└── ...
+```
+
+For I2V (image-to-video) training:
+```yaml
+datasets:
+  - folder_path: /path/to/videos
+    caption_ext: txt
+    caption_dropout_rate: 0.3
+    resolution: [512]
+    max_pixels_per_frame: 262144
+    shrink_video_to_frames: true
+    num_frames: 33  # or 41, 49, etc.
+    do_i2v: true    # Enable I2V mode
+```
+
+### Monitoring Video Training
+
+Video training produces noisier metrics than image training. Expect:
+- **Loss R²**: 0.007-0.05 (vs 0.1-0.3 for images)
+- **Gradient Stability**: 0.45-0.60 (vs 0.55-0.70 for images)
+- **Phase Transitions**: Longer times to plateau (video variance is high)
+
+Check metrics at: `output/{job_name}/metrics_{job_name}.jsonl`
+
+### Supported Video Models
+
+- **WAN 2.2 14B I2V** - Image-to-video generation with MoE (Mixture of Experts)
+- **WAN 2.1** - Earlier I2V model
+- Other video diffusion models with LoRA support
+
+For WAN 2.2 14B I2V, ensure you enable MoE-specific settings:
+```yaml
+model:
+  name_or_path: "ai-toolkit/Wan2.2-I2V-A14B-Diffusers-bf16"
+  arch: "wan22_14b_i2v"
+  quantize: true
+  qtype: "uint4|ostris/accuracy_recovery_adapters/wan22_14b_i2v_torchao_uint4.safetensors"
+  model_kwargs:
+    train_high_noise: true
+    train_low_noise: true
+
+train:
+  switch_boundary_every: 100  # Switch between experts every 100 steps
+```
+
+
 ## Updates
 
 Only larger updates are listed here. There are usually smaller daily updated that are omitted.