Skip to content

Conversation

@Sunny-bot1
Copy link
Contributor

@Sunny-bot1 Sunny-bot1 commented Dec 11, 2025

PR Category

Performance Optimization

PR Types

New features

Description

为DeepEP ll two stage适配激活per-token量化

@paddle-bot
Copy link

paddle-bot bot commented Dec 11, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Sunny-bot1 Sunny-bot1 changed the title support fp8 per token quant for deepep two stage support fp8 per token quant for deepep low latency two stage Dec 12, 2025
sizeof(int4) + (kUseFP8 ? (kHidden + kNumScales * sizeof(float))
: (kHidden * sizeof(nv_bfloat16)));
sizeof(int4) + (kUseFP8
? (kHidden + (kNumScales + 3) / 4 * 4 * sizeof(float))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最好不要硬编码成和4对齐,constexpr ALIGN_ELEMS=xxx,类似这种再对齐。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

别的地方也一样

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最好不要硬编码成和4对齐,constexpr ALIGN_ELEMS=xxx,类似这种再对齐。

done

Copy link
Contributor

@yangjianfengo1 yangjianfengo1 Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是由于一次load 16字节,4个float, 所以需要4个float 对齐,不要用环境变量

auto num_tokens = static_cast<int>(x.size(0)),
hidden = static_cast<int>(x.size(1));
auto num_scales = hidden / 128, num_topk = static_cast<int>(topk_idx.size(1));
auto num_scales = num_per_channel == -1 ? 1 : hidden / 128,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果引入了num_per_channel,这里是不是改成hidden / num_per_channel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果引入了num_per_channel,这里是不是改成hidden / num_per_channel

这样的话,per-token的num_per_channel需要传hidden_size进来,参数会有点繁琐

@yangjianfengo1
Copy link
Contributor

LGTM

@yuanlehome yuanlehome merged commit 5e27f87 into PaddlePaddle:develop Dec 17, 2025
77 of 83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants