[Bug] k_grouped_fp8_gemm_nt_contiguous crashes with n = 768 on H100



First of all, thank you for the amazing work on DeepGEMM — it's been extremely helpful.
While integrating DeepGEMM into a backward pass implementation, I encountered a reproducible crash when running the **k-grouped FP8 GEMM** with **N = 768**.

---

## **❗ Error**

Running DeepGEMM with the following shape causes a CUDA illegal instruction error:

```
RuntimeError: CUDA driver error (csrc/apis/../jit_kernels/impls/sm90_fp8_gemm_1d1d.hpp:65): 715 
(CUDA_ERROR_ILLEGAL_INSTRUCTION, an illegal instruction was encountered)
```

---

## **🔁 Reproduction**

The issue reproduces consistently by adding the following shape into
`enumerate_k_grouped_contiguous`:

```python
(128, 2048, 768, 4096)
```

This triggers the following call path:

* `k_grouped_fp8_gemm_nt_contiguous`
* → FP8 kernel selection
* → SM90 kernel dispatch
* → crash with CUDA illegal instruction

Notably, the same configuration works correctly when **N = 1536**, so the issue appears to be specific to `N = 768`.

---

## **🧩 Expected Behavior**

The kernel should run successfully for `(groups=128, M=2048, N=768, K=4096)` without causing an illegal instruction.

---

## **🧪 Environment (if helpful)**

* GPU: H100 (SM90)
* CUDA Toolkit: CUDA 12.9 Driver Version: 535.161.08
* PyTorch version: 2.8.0

---

## **🙏 Additional Notes**

If you need further logs or want me to test a patch, I’m happy to help.

Thanks again for the excellent work on DeepGEMM!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] k_grouped_fp8_gemm_nt_contiguous crashes with n = 768 on H100 #237

❗ Error

🔁 Reproduction

🧩 Expected Behavior

🧪 Environment (if helpful)

🙏 Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] k_grouped_fp8_gemm_nt_contiguous crashes with n = 768 on H100 #237

Description

❗ Error

🔁 Reproduction

🧩 Expected Behavior

🧪 Environment (if helpful)

🙏 Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions