-
Notifications
You must be signed in to change notification settings - Fork 13.3k
convert : force patch_embd weights to F16 or F32 to avoid broken GGUFs #15367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I mean we can extend this list: llama.cpp/convert_hf_to_gguf.py Lines 314 to 330 in de56279
But not 100% sure if it works though |
Sure, I understood, I just wasn't sure it did either. :) Edit: It works, but Anyway, since it's only a fallback for |
On second thought, this is a bit strange. IM2COL takes 2 inputs:
So logically say, the type of IM2COL should only depend on the input tensor. Looking at the CUDA kernel, the data of llama.cpp/ggml/src/ggml-cuda/im2col.cu Lines 99 to 100 in de56279
So I'm wondering if the problem could be due to another reason. |
It's because |
Hmm ok in this case I think the current approach is safer. Still a bit strange that im2col's output type is the same as kernel dtype. I expected it to be the same as input dtype. But will be quite risky to change it now, as it will be a breaking change. |
Force
patch_embd
weights to F16 or F32 to avoid broken GGUFs (f.ex. when using --outtype bf16) asIM2COL
op requiresF16
orF32
. Only useF16
if forced by user or guessed,F32
being the safest choice and the tensor is not that large anyway.