Apt-Serve

Code repository for the SIGMOD 25 paper: "Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving".
Apt-Serve is a serving framework prototype implemented on top of vLLM (release version: 0.5.0 post1). All the adds-on by the framework are located in the folder additional_designs.
Note that Apt-Serve is a research prototype, which does not support complete features of the up-to-date vLLM. We have only adopted some key parts of the codebase for faster research iterations.

Getting Started

Install the backbone system (vLLM 0.5.0. post1) first. Following guidelines from https://github.com/vllm-project/vllm.
Insert the additional designs:

bash additional_designs/insert_designs.sh

Install the customized cuda kernels to support hybrid cache:

python additional_designs/mixed_cache_kernels/mixed_cache_setup.py build_ext --inplace

With all these steps completed, the necessary implementation for the new designs has been integrated into vLLM and is ready for usage.

Sample Serving Traces

Following readme.md from the folder sample_requests_from_datasets to sample requests to create a serving trace. The sampled requests are automatically saved into ./sampled_datasets/ folder.

Serving Simulation

Use OPT-13B as an example.
Start the server side by:

python -m vllm.entrypoints.openai.api_server --model facebook/opt-13b --enforce-eager --disable-log-requests

After the server side is set up, start the client side code to simulate the request arrivals:

python gen_client_requests.py --model facebook/opt-13b --request-rate 3 --cv 1 --dataset sharegpt

Exemplar Serving Result Comparsion

vLLM:

Apt-Serve:

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
additional_designs		additional_designs
sample_requests_from_datasets		sample_requests_from_datasets
README.md		README.md
backend_request_func_SLO.py		backend_request_func_SLO.py
gen_client_requests.py		gen_client_requests.py
greedy_scheduling_appendix.pdf		greedy_scheduling_appendix.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apt-Serve

Getting Started

Sample Serving Traces

Serving Simulation

Exemplar Serving Result Comparsion

About

Uh oh!

Releases

Packages

Languages

eddiegaoo/Apt-Serve

Folders and files

Latest commit

History

Repository files navigation

Apt-Serve

Getting Started

Sample Serving Traces

Serving Simulation

Exemplar Serving Result Comparsion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages