thorneliu

Follow

💭

I may be slow to respond.

LiuQiang thorneliu

💭

I may be slow to respond.

Follow

CUDA | AI serving| SD | TF TRT | CTR | brpc | | LLM C++ programmer since 2012

22 followers · 33 following

vivo
Hangzhou, Zhejiang, China
08:46 (UTC +08:00)

Achievements

Achievements

Stars

ArcReel / ArcReel

AI Agent 驱动的开源视频生成工作台 — 小说→角色/场景/道具设计→剧本→分镜图→视频，跨镜头角色与场景一致 | Open-source AI video workspace powered by AI Agents, Nano Banana 2 & Veo 3.1 / Grok / Seedance / OpenAI

Python 1,760 378 Updated Apr 15, 2026

chenglou / pretext

Fast, accurate & comprehensive text measurement & layout

TypeScript 44,039 2,406 Updated Apr 15, 2026

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,097 676 Updated Apr 15, 2026

alibaba / yalantinglibs

A collection of modern C++ libraries, include coro_http, coro_rpc, compile-time reflection, struct_pack, struct_json, struct_xml, struct_pb, easylog, async_simple etc.

C++ 2,118 313 Updated Apr 11, 2026

MoFHeka / execution-ucx

A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.

C++ 30 4 Updated Apr 9, 2026

taskflow / taskflow

A General-purpose Task-parallel Programming System in C++

C++ 11,920 1,392 Updated Apr 14, 2026

dbreunig / whenwords

A relative time formatting library, with no code.

1,240 72 Updated Jan 20, 2026

antvis / Infographic

🦋 An Infographic Generation and Rendering Framework, bring words to life with AI!

TypeScript 4,823 352 Updated Apr 15, 2026

JyChen9811 / FaithDiff

[CVPR 2025] FaithDiff for Classic Film Rejuvenation, Old Photo Revival, Social Media Restoration, Image Enhancement and AIGC Enhancement.

Python 247 17 Updated Mar 25, 2026

alephpi / Texo

A minimalist SOTA LaTeX OCR model with only 20M parameters, running in browser. Full training pipeline available for self-reproduction. | 超轻量SOTA LaTeX公式识别模型，仅20M参数量，可在浏览器中运行。训练全流程代码开源，以便自学复现。

Python 792 46 Updated Feb 23, 2026

modal-labs / gpu-glossary

GPU documentation for humans

Python 561 70 Updated Mar 24, 2026

luzhenhua / NCE-Flow

新概念英语在线点读，点句即读、连续播放，支持 EN / EN+CN / CN。

JavaScript 1,986 333 Updated Feb 14, 2026

CalebDu / Awesome-Cute

C++ 119 18 Updated May 16, 2025

sgl-project / DeepGEMM

Forked from deepseek-ai/DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 22 12 Updated Apr 9, 2026

BBuf / Panzhihua-Mi-Yi-Pipa

If you want to purchase Panzhihua Mi Yi Pipa, please contact me.

11 1 Updated Mar 16, 2026

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,858 5,379 Updated Apr 16, 2026

wuye9036 / CppTemplateTutorial

中文的C++ Template的教学指南。与知名书籍C++ Templates不同，该系列教程将C++ Templates作为一门图灵完备的语言来讲授，以求帮助读者对Meta-Programming融会贯通。(正在施工中)

C++ 10,527 1,619 Updated Aug 20, 2024

wbopan / moffee

moffee: Make Markdown Ready to Present

Python 1,332 65 Updated Aug 2, 2025

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,596 315 Updated Apr 9, 2026

federico-busato / Modern-CPP-Programming

Modern C++ Programming Course (C++03/11/14/17/20/23/26)

HTML 14,905 1,061 Updated Mar 16, 2026

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 491 63 Updated Jan 8, 2026

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,279 1,041 Updated Apr 12, 2026

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 5,144 361 Updated Apr 9, 2026

ztxz16 / fastllm

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型，任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型，单并发20tps；INT4量化模型单并发30tps，多并发可达60+。

C++ 4,189 418 Updated Apr 10, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,313 277 Updated Apr 8, 2026

google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,853 625 Updated Apr 14, 2026

Mq-b / Loser-HomeWork

卢瑟们的作业展示，答案讲解，以及一些C++知识

C++ 751 139 Updated Dec 20, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,924 270 Updated Apr 9, 2026

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,418 773 Updated Mar 30, 2026

chengzeyi / stable-fast

https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Python 1,306 92 Updated Mar 27, 2025