-
vivo
- Hangzhou, Zhejiang, China
-
19:36
(UTC +08:00)
Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Making large AI models cheaper, faster and more accessible
SGLang is a high-performance serving framework for large language models and multimodal models.
Code samples for my book "Neural Networks and Deep Learning"
Hackable and optimized Transformers building blocks, supporting a composable construction.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
AI Agent 驱动的开源视频生成工作台 — 小说→角色/场景/道具设计→剧本→分镜图→视频,跨镜头角色与场景一致 | Open-source AI video workspace powered by AI Agents, Nano Banana 2 & Veo 3.1 / Grok / Seedance / OpenAI
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
A minimalist SOTA LaTeX OCR model with only 20M parameters, running in browser. Full training pipeline available for self-reproduction. | 超轻量SOTA LaTeX公式识别模型,仅20M参数量,可在浏览器中运行。训练全流程代码开源,以便自学复现。
A low-latency & high-throughput serving engine for LLMs
[CVPR 2025] FaithDiff for Classic Film Rejuvenation, Old Photo Revival, Social Media Restoration, Image Enhancement and AIGC Enhancement.
A memory efficient DLRM training solution using ColossalAI

