Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
81991fc
oai moe
ngxson Jul 7, 2025
917f923
compat with new checkpoint
ngxson Jul 7, 2025
a4ab869
add attn sink impl
ngxson Jul 7, 2025
3801c36
add rope scaling yarn
ngxson Jul 8, 2025
13f39f6
logits match with latest transformers code
ngxson Jul 8, 2025
b3594b3
wip chat template
ngxson Jul 8, 2025
bd57158
Merge branch 'master' into xsn/oai_moe
ngxson Jul 9, 2025
089a7ab
rm trailing space
ngxson Jul 9, 2025
4d01b36
use ggml_scale_bias
ngxson Jul 9, 2025
f271cc8
Merge branch 'master' into xsn/oai_moe
ngxson Jul 10, 2025
106b17e
rm redundant is_swa_all
ngxson Jul 10, 2025
e2c1beb
convert interleaved gate_up
ngxson Jul 15, 2025
4431c82
Merge remote-tracking branch 'gg-public/master' into xsn/oai_moe-gg
ggerganov Jul 20, 2025
fe9b818
Merge remote-tracking branch 'gg-public/master' into xsn/oai_moe-gg
ggerganov Jul 24, 2025
539c2b6
Merge remote-tracking branch 'gg-public/master' into xsn/oai_moe-gg
ggerganov Jul 29, 2025
039a6f1
graph : fix activation function to match reference (#7)
ggerganov Jul 31, 2025
aa240b9
Merge branch 'master' into xsn/oai_moe-gg
ggerganov Jul 31, 2025
32a654c
Merge branch 'master' into xsn/oai_moe-gg
ggerganov Aug 1, 2025
13f3568
vocab : handle o200k_harmony special tokens
ggerganov Aug 1, 2025
e59b2eb
ggml : add attention sinks support (#1)
ggerganov Aug 1, 2025
832dc26
repack mxfp4 upon conversion
ngxson Aug 1, 2025
c68069d
clean up a bit
ngxson Aug 1, 2025
423b191
enable thinking
ngxson Aug 1, 2025
4dd479b
add quick hack to render only some special tokens
ngxson Aug 1, 2025
ebc7da5
fix bf16 conversion
ngxson Aug 1, 2025
a543ddf
remove vocab hack
ngxson Aug 1, 2025
6b30372
webui ok
ngxson Aug 1, 2025
44bdb75
support chat parsing for gpt-oss
ngxson Aug 1, 2025
65b536f
Merge branch 'master' into xsn/oai_moe
ggerganov Aug 2, 2025
6197917
fix webui
ngxson Aug 2, 2025
3c4725b
direct mapping mxfp4, FINALLY
ngxson Aug 2, 2025
04cfb6d
force using mxfp4
ngxson Aug 2, 2025
4cf69df
properly use lazy tensor
ngxson Aug 3, 2025
ec95c0e
ggml : add mxfp4
ggerganov Jul 20, 2025
3ef6c8c
ggml : add ggml_add_id (#13)
slaren Aug 4, 2025
cd514cc
Merge branch 'master' into xsn/oai_moe
slaren Aug 5, 2025
98c4be5
Merge branch 'xsn/oai_moe' into mxfp4-rebased
slaren Aug 5, 2025
fcb2339
Merge branch 'master' into gpt-oss-mxfp4
ngxson Aug 5, 2025
98f3444
llama : fix compile error
ggerganov Aug 5, 2025
df8411e
cuda : add fallback for __nv_cvt_e8m0_to_bf16raw
slaren Aug 5, 2025
60ab08a
cleanup
slaren Aug 5, 2025
256fe66
sycl : fix supports_op for MXFP4
slaren Aug 5, 2025
cd8ed32
fix Unknown reasoning format
ngxson Aug 5, 2025
a3b291e
ggml-cpu : fix AVX build
slaren Aug 5, 2025
1ea3769
fix hip build
slaren Aug 5, 2025
07d781e
cuda : add mxfp4 dequantization support for cuBLAS
slaren Aug 5, 2025
b236c90
ggml-cpu : fix mxfp4 fallback definitions for some architectures
slaren Aug 5, 2025
d9d89b4
cuda : fix version required for __nv_cvt_e8m0_to_bf16raw
slaren Aug 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
properly use lazy tensor
  • Loading branch information
ngxson committed Aug 3, 2025
commit 4cf69dff63bf1ed7817bacf35cac3450523913e2
11 changes: 4 additions & 7 deletions convert_hf_to_gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -7810,7 +7810,6 @@ class GptOssModel(TextModel):
def transform_nibble_layout(self, tensor):
assert tensor.dtype == torch.uint8
assert tensor.shape[-1] == 16
tensor = tensor.clone().to(device="cpu")
# swap nibbles
t_lo = tensor & 0x0F
t_hi = tensor & 0xF0
Expand Down Expand Up @@ -7839,15 +7838,13 @@ def repack_mxfp4(self, new_name: str, blocks: Tensor, scales: Tensor):
scales = scales.unsqueeze(-1)
assert len(blocks.shape) == 4
assert len(scales.shape) == 4
# convert to numpy
scales = scales.to_eager(scales).numpy()
blocks = blocks.to_eager(blocks)
blocks = self.transform_nibble_layout(blocks).numpy()
new_data = np.concatenate([scales, blocks], axis=-1)
blocks = self.transform_nibble_layout(blocks)
new_data = torch.concat((scales, blocks), dim=-1)
new_shape = [new_data.shape[0], new_data.shape[1], new_data.shape[2] * 32]
logger.info(f"Repacked {new_name} with shape {new_shape} and quantization MXFP4")
# flatten last dim
new_data = new_data.reshape(new_data.shape[0], new_data.shape[1], new_data.shape[2] * new_data.shape[3])
new_data = new_data.view(new_data.shape[0], new_data.shape[1], new_data.shape[2] * new_data.shape[3])
new_data = new_data.numpy()
self.gguf_writer.add_tensor(new_name, new_data, raw_dtype=gguf.GGMLQuantizationType.MXFP4)

def generate_extra_tensors(self) -> Iterable[tuple[str, Tensor]]:
Expand Down