ggml-alloc : fix reuse-parent logic for misaligned sizes #17884
Merged
+15
−13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cont #16679
I encountered a bug in the logic for parent buffer reuse which occurs when the alloc size of the node and the parent are not aligned to the buffer alignment.
The logic in
ggml_gallocr_free_extra_spaceaims to free the extra space that is left when reusing a parent buffer with a larger alloc size than the node. This case occurs currently only with the Metal backend because we allocate additional size for some tensors:llama.cpp/ggml/src/ggml-metal/ggml-metal.cpp
Lines 183 to 217 in 0cdce38
The implementation of
ggml_gallocr_free_extra_spacedid not take into account that the passedextra_sizeto theggml_dyn_tallocr_free_tensorcall will be padded/aligned here:llama.cpp/ggml/src/ggml-alloc.c
Lines 312 to 316 in 0cdce38
Depending on the specific tensor sizes / alignment, this could lead to corruption of the allocator chunks. The fix in this PR takes into account the alignment before computing the
extra_size, which guarantees that after freeing it, the chunks will remain aligned.Also:
ggml_dyn_tallocr_free_tensor->ggml_dyn_tallocr_free_bytesGGML_ALLOCATOR_DEBUGand using the Metal backend. The problem was thatggml_dyn_tallocr_free_tensorassumed to always remove an allocated tensor and hence theremove_allocated_tensor()debug call in it. However, when we free extra bytes from a parent buffer, we are not actually removing a tensor - just the extra bytes. To fix that, we now callremove_allocated_tensor()only inggml_gallocr_free_node()