Finn-Xd

Finn-Xd

Stars

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,551 1,003 Updated Mar 31, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 9,096 1,139 Updated Mar 31, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,309 852 Updated Mar 22, 2026

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,936 319 Updated Jan 14, 2026

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,790 1,027 Updated Mar 30, 2026