A hardware implementation of the NVFP4 (E2M1) floating point format in SystemVerilog, synthesized in Intel Quartus, with a Python reference model for verification.
This project implements:
- Quantization of BF16 vectors to NVFP4 format with a shared group exponent
- Dequantization of NVFP4 back to BF16
- Dot product of two NVFP4 vectors
- Quantization error analysis datapath
- A Python reference model that mirrors the hardware behavior
- NVFP4 (E2M1): 1 sign bit, 2 exponent bits, 1 mantissa bit
- Group size: 16 elements per shared exponent
- Bias: 2, Max exponent: 3
- Input: BF16 (16-bit)
- Output: 4-bit compressed elements + 8-bit shared exponent
conv_bf16tonvfp4.sv— Converts BF16 vector to NVFP4conv_nvfp4tobf16.sv— Dequantizes NVFP4 vector back to BF16quantization_error_datapath.sv— Chains quantization and dequantization to measure errordot_fp.sv— Computes dot product of two NVFP4 vectorsnvfp4_datapath_top.sv— Top level datapathunsigned_max.sv,fp_rnd_rne.sv,clz_int.sv,vec_mul_fp.sv,vec_sum_int.sv,mul_fp.sv,mul_int.sv— Submodules
tb_conv_bf16tonvfp4.sv— Verifies BF16 to NVFP4 conversiontb_conv_nvfp4tobf16.sv— Verifies NVFP4 to BF16 dequantizationtb_quantization_error_datapath.sv— Verifies quantization error datapathtb_dot_fp.sv— Verifies dot product computationtb_nvfp4_datapath_top.sv— Top level simulation
BF16_to_NVFP4.py— Converts BF16 vector to NVFP4NVFP4_to_BF16.py— Dequantizes NVFP4 back to BF16quantization_error.py— Computes average quantization error
- All 1.0 inputs → NVFP4 =
0110, shared_exp = 125 ✓ - All 1.5 inputs → NVFP4 =
0111, shared_exp = 125 ✓ - All 2.0 inputs → NVFP4 =
0110, shared_exp = 126 ✓ - Dot product of 16×
0110· 16×0110= 1024 ✓ - Dot product of 16×
0110· 16×0111= 1536 ✓ - Round-trip 1.0 → NVFP4 → BF16 = 1.0, error = 0 ✓
- Round-trip 0.125 in mixed group → dequantizes to 0.15625, nonzero error ✓
Based on the MX (Microscaling) open source repo, parameterized for NVFP4.