Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
blend.py	blend.py

Name

Last commit message

Last commit date

Examples vLLM + LMCache w. CacheBlend

LMCache should be able to reduce the generation time of the second and following calls (even though the reused KV cache is not a prefix).

Some ad-hoc changes needed in vLLM

In vllm/vllm/v1/worker/gpu_worker.py, comment out ensure_kv_transfer_initialized(vllm_config) in function def init_worker_distributed_environment.
In the same file, add

from lmcache.v1.compute.models.utils import VLLMModelTracker
from lmcache.integration.vllm.utils import ENGINE_NAME
        
VLLMModelTracker.register_model(ENGINE_NAME, self.model_runner.model)
ensure_kv_transfer_initialized(self.vllm_config)

at the end of the function def load_model.

CPU offloading

python blend.py - CacheBlend with CPU as backend

Disk offloading

python blend.py --use-disk - CachBlend with local disk as backend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Examples vLLM + LMCache w. CacheBlend

Some ad-hoc changes needed in vLLM

CPU offloading

Disk offloading

FilesExpand file tree

blend_kv_v1

Directory actions

More options

Directory actions

More options

Latest commit

History

blend_kv_v1

Folders and files

parent directory

README.md

Examples vLLM + LMCache w. CacheBlend

Some ad-hoc changes needed in vLLM

CPU offloading

Disk offloading