-
18:41
(UTC -08:00) - https://rogerw.io
- in/rogerywang
- @rogerw0108
Stars
A framework for efficient model inference with omni-modality models
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
A Datacenter Scale Distributed Inference Serving Framework
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
FlashMLA: Efficient Multi-head Latent Attention Kernels
how to optimize some algorithm in cuda.
Entropy Based Sampling and Parallel CoT Decoding
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A high-throughput and memory-efficient inference and serving engine for LLMs



