Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

大模型推理优化

综述

  • LLM Inference Serving: Survey of Recent Advances and Opportunities

  • Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

  • BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching:https://arxiv.org/pdf/2411.16102

  • FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion: https://arxiv.org/abs/2406.06858

  • BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching:https://arxiv.org/abs/2412.03594