Skip to content

CrazyDave999/Mini-SGLang

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mini-SGLang

A simple version of sglang project. For study purpose.

Roadmap

Basic Architecture

  • Tokenizer & Detokenizer & Scheduler procs
  • Model Runner forward
  • Zmq IPC for worker/control reqs
  • Server & APIs

Memory Management

  • PageManager
    • page size = 1
    • page size > 1
  • KVCache
  • RadixCache
    • Insert Prefix
    • Prefix Matching
    • Evict

Scheduler

  • FSFS/Random
  • Cache Aware(LPM)
  • Aggressive max_new_tokens prediction & Retracting
  • Chunked Prefill

Backend

  • Torch Native kernels
  • FA3 support

Distributed Support

  • Tensor Parallelism
  • Data Parallel Attention(for MLA)
  • MoE support
    • FusedMoE
    • EPMoE

Models

  • Llama 3
  • Mixtral MoE

Optimizations

  • Stream Output
  • CUDA graph forward
  • Overlap Scheduling
  • Unit tests

About

Mini-SGLang is a mini version of SGLang project ().

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published