llm-action/llm-optimizer at main · Jonyge/llm-action

Name	Name	Last commit message	Last commit date
parent directory ..
FlashAttention.md	FlashAttention.md
README.md	README.md
kv-cache.md	kv-cache.md
xformers.md	xformers.md
计算通信重叠.md	计算通信重叠.md

Name

Last commit message

Last commit date

大模型推理优化

LLM Inference Serving: Survey of Recent Advances and Opportunities
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching：https://arxiv.org/pdf/2411.16102
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion: https://arxiv.org/abs/2406.06858
BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching：https://arxiv.org/abs/2412.03594