-
-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Description
Your current environment
INFO 11-05 12:03:00 engine.py:290] Added request chat-58cc8fe807d34717b775ea663d913bcb.
ERROR 11-05 12:03:00 client.py:250] RuntimeError('Engine loop has died')
ERROR 11-05 12:03:00 client.py:250] Traceback (most recent call last):
ERROR 11-05 12:03:00 client.py:250] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 150, in run_heartbeat_loop
ERROR 11-05 12:03:00 client.py:250] await self._check_success(
ERROR 11-05 12:03:00 client.py:250] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 314, in _check_success
ERROR 11-05 12:03:00 client.py:250] raise response
ERROR 11-05 12:03:00 client.py:250] RuntimeError: Engine loop has died
INFO 11-05 12:03:01 metrics.py:349] Avg prompt throughput: 6.5 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 5.0%, CPU KV cache usage: 0.0%.
INFO: 10.12.17.5:58280 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 10.12.17.5:58489 - "GET /v1/models HTTP/1.1" 200 OK
CRITICAL 11-05 12:03:03 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO: 10.12.17.5:58489 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
Model Input Dumps
export CUDA_VISIBLE_DEVICES=2
export VLLM_USE_MODELSCOPE= False
vllm serve ./Qwen2_5-14B-Instruct-AWQ
--host 0.0.0.0
--port 2015
--tensor-parallel-size 1
--gpu-memory-utilization 0.9
--trust-remote-code
--enforce-eager
--lora-modules role=/workspace/output/role/qwen/qwen2_5-14b-instruct-awq/v1-20241101-133149/checkpoint-1550
--enable-lora \
🐛 Describe the bug
INFO 11-05 12:03:00 engine.py:290] Added request chat-58cc8fe807d34717b775ea663d913bcb.
ERROR 11-05 12:03:00 client.py:250] RuntimeError('Engine loop has died')
ERROR 11-05 12:03:00 client.py:250] Traceback (most recent call last):
ERROR 11-05 12:03:00 client.py:250] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 150, in run_heartbeat_loop
ERROR 11-05 12:03:00 client.py:250] await self._check_success(
ERROR 11-05 12:03:00 client.py:250] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 314, in _check_success
ERROR 11-05 12:03:00 client.py:250] raise response
ERROR 11-05 12:03:00 client.py:250] RuntimeError: Engine loop has died
INFO 11-05 12:03:01 metrics.py:349] Avg prompt throughput: 6.5 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 5.0%, CPU KV cache usage: 0.0%.
INFO: 10.12.17.5:58280 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 10.12.17.5:58489 - "GET /v1/models HTTP/1.1" 200 OK
CRITICAL 11-05 12:03:03 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO: 10.12.17.5:58489 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
vllm 0.6.3.post1
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.