Skip to content
View thorneliu's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.
  • vivo
  • Hangzhou, Zhejiang, China
  • 19:36 (UTC +08:00)

Block or report thorneliu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
23 stars written in Python
Clear filter

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 76,897 15,681 Updated Apr 16, 2026

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 42,126 4,797 Updated Apr 16, 2026

Making large AI models cheaper, faster and more accessible

Python 41,369 4,515 Updated Apr 13, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,893 5,395 Updated Apr 16, 2026

Code samples for my book "Neural Networks and Deep Learning"

Python 17,589 7,028 Updated Jun 2, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,419 773 Updated Mar 30, 2026

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,779 685 Updated Apr 16, 2026

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Python 5,668 505 Updated Jul 18, 2024

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 5,144 361 Updated Apr 9, 2026

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,720 386 Updated Apr 9, 2026

我的自学笔记,终身更新

Python 3,952 484 Updated Apr 7, 2026

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,597 315 Updated Apr 9, 2026

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,292 195 Updated Mar 27, 2024

AI Agent 驱动的开源视频生成工作台 — 小说→角色/场景/道具设计→剧本→分镜图→视频,跨镜头角色与场景一致 | Open-source AI video workspace powered by AI Agents, Nano Banana 2 & Veo 3.1 / Grok / Seedance / OpenAI

Python 1,775 381 Updated Apr 15, 2026

moffee: Make Markdown Ready to Present

Python 1,332 65 Updated Aug 2, 2025

https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Python 1,306 92 Updated Mar 27, 2025

A minimalist SOTA LaTeX OCR model with only 20M parameters, running in browser. Full training pipeline available for self-reproduction. | 超轻量SOTA LaTeX公式识别模型,仅20M参数量,可在浏览器中运行。训练全流程代码开源,以便自学复现。

Python 793 46 Updated Feb 23, 2026

GPU documentation for humans

Python 561 71 Updated Mar 24, 2026

A low-latency & high-throughput serving engine for LLMs

Python 491 63 Updated Jan 8, 2026

[CVPR 2025] FaithDiff for Classic Film Rejuvenation, Old Photo Revival, Social Media Restoration, Image Enhancement and AIGC Enhancement.

Python 247 17 Updated Mar 25, 2026

A memory efficient DLRM training solution using ColossalAI

Python 107 13 Updated Nov 22, 2022
Python 59 17 Updated Nov 21, 2024