-
Notifications
You must be signed in to change notification settings - Fork 753
feat: router supporting intra-worker dp routing #1285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
86c79ba
6bee243
dab052c
be6900e
25e1291
34e5c5b
2cef74c
10d3326
4483c68
263c12d
65ea6b5
a2ef896
e80d66c
7a733bd
1bddc8e
ee283cc
183a8fe
be7f951
e1011d8
5bf4fae
d6ded6c
61b94ac
9335efe
931b837
2a72271
eb7bb10
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,8 +15,7 @@ | |
| Common: | ||
| model: Qwen/Qwen3-0.6B | ||
| data-parallel-size: 2 | ||
| router: kv | ||
| block-size: 64 | ||
| block-size: 16 | ||
| max-model-len: 16384 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we need to set that? |
||
| served_model_name: Qwen/Qwen3-0.6B | ||
|
|
||
|
|
@@ -29,7 +28,7 @@ VllmDecodeWorker: | |
| max-num-batched-tokens: 16384 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we need to set that? |
||
| enable-prefix-caching: true | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is enabled by default in V1 |
||
| ServiceArgs: | ||
| workers: 2 # 2 workers | ||
| workers: 1 # 2 workers | ||
| resources: | ||
| gpu: 2 # 2 dp ranks | ||
| common-configs: [model, served_model_name, block-size, data-parallel-size, max-model-len] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to set that?