A simple version of sglang project. For study purpose.
- Tokenizer & Detokenizer & Scheduler procs
- Model Runner forward
- Zmq IPC for worker/control reqs
- Server & APIs
- PageManager
- page size = 1
- page size > 1
- KVCache
- RadixCache
- Insert Prefix
- Prefix Matching
- Evict
- FSFS/Random
- Cache Aware(LPM)
- Aggressive max_new_tokens prediction & Retracting
- Chunked Prefill
- Torch Native kernels
- FA3 support
- Tensor Parallelism
- Data Parallel Attention(for MLA)
- MoE support
- FusedMoE
- EPMoE
- Llama 3
- Mixtral MoE
- Stream Output
- CUDA graph forward
- Overlap Scheduling
- Unit tests