-
Notifications
You must be signed in to change notification settings - Fork 14.1k
CANN: add operator fusion support for ADD+RMS_NORM operations #17512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Before: After: |
3d516d1 to
968d6e0
Compare
|
Op Test: |
968d6e0 to
7867a78
Compare
|
|
||
| ### GGML_CANN_OPERATOR_FUSION | ||
|
|
||
| Enable operator fusion during computation, default is false. This option fuses compatible operators (e.g., ADD + RMS_NORM) to reduce overhead and improve performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this feature has change to improve performance, should we enable by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better for user not set any parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some scenarios it may bring performance improvements, but it may also introduce unexpected issues. At the moment it should be considered more like an experimental version. Once the features become stable in the future, it will be enabled by default.
| for (int i = 0; i < cgraph->n_nodes; i++) { | ||
| ggml_tensor * node = cgraph->nodes[i]; | ||
| if (opt_fusion) { | ||
| if (ggml_cann_can_fuse(cgraph, i, { GGML_OP_ADD, GGML_OP_RMS_NORM })) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (ggml_cann_can_fuse(cgraph, i, { GGML_OP_ADD, GGML_OP_RMS_NORM })) { | |
| if (ggml_cann_can_fuse(cgraph, i, { cgraph->nodes[i]->op, cgraph->nodes[i+1]->op})) { | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No changes are needed here. The underlying layer will call ggml’s generic fuse check, which will determine whether, starting from the i-th node of the current cgraph, the operator sequence matches.
| } | ||
|
|
||
| void ggml_cann_op_add_rms_norm_fused(ggml_backend_cann_context & ctx, | ||
| ggml_tensor * dst, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result should write in rms_norm's dst?
Add operator fusion optimization to improve performance by fusing compatible operations into single kernel calls. Currently supports fusing ADD and RMS_NORM operations. Changes: - Add new environment variable GGML_CANN_OPERATOR_FUSION to enable/disable operator fusion (default: false) - Implement ggml_cann_op_add_rms_norm_fused() function that fuses ADD and RMS_NORM operations using aclnnAddRmsNorm API - Add ggml_cann_can_fuse() helper function to check if operations can be fused in CANN backend - Update evaluate_and_capture_cann_graph() to detect and apply operator fusion when enabled This optimization reduces overhead between operations and improves overall computational efficiency for models using ADD followed by RMS_NORM patterns.
83a1a22 to
adc8b0d
Compare
Add operator fusion optimization to improve performance by fusing compatible operations into single kernel calls. Currently supports fusing ADD and RMS_NORM operations.
Changes:
This optimization reduces overhead between operations and improves overall computational efficiency for models using ADD followed by RMS_NORM patterns.
Make sure to read the contributing guidelines before submitting a PR