Performance Parity for H100_mma_ABt and H100_mma Kernels #98

SohamGovande · 2025-03-05T18:17:08Z

Summary

This 4-line code change achieves performance parity between the transposed (H100_mma_ABt) and non-transposed (H100_mma) matmul kernels by dispatching the largest available tensor core instruction (wgmma of size 64x16x256). Previously, the transposed kernel was approximately 75-80 TFLOPS slower than its non-transposed counterpart.

Changes

Changed wgmma instruction size from 64x16x64 to 64x16x256.
Fixed strides for the column-major B tensor. Ensures correctness for cases where N != K, resolving previous correctness check failures.

Benchmark Changes

Updated benchmark dimensions from square (N=4096) to rectangular (M=2048, N=4096, K=8192) to showcase and validate performance improvements and correctness for non-square inputs.

Testing

Verified correctness and performance improvements through internal benchmarks. Confirmed stable results and parity with H100_mma.

Soham Govande and others added 7 commits March 5, 2025 10:08

Performance on MMA_ABt +60 TFLOPS by using larger wgmma instructions

5d8dddf

MMA_ABt works on non-uniform sizes

ace5ba2

Merge branch 'HazyResearch:main' into mma-abt-performance

d8afe81

.

f2b081a

.

3fc1a35

.

d8c6b60

.

138f687

DanFu09 merged commit 419d813 into HazyResearch:main Jun 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance Parity for H100_mma_ABt and H100_mma Kernels #98

Performance Parity for H100_mma_ABt and H100_mma Kernels #98

Uh oh!

SohamGovande commented Mar 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Performance Parity for H100_mma_ABt and H100_mma Kernels #98

Performance Parity for H100_mma_ABt and H100_mma Kernels #98

Uh oh!

Conversation

SohamGovande commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Benchmark Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SohamGovande commented Mar 5, 2025 •

edited

Loading