Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Dec 9, 2025

cont #16679

I encountered a bug in the logic for parent buffer reuse which occurs when the alloc size of the node and the parent are not aligned to the buffer alignment.

The logic in ggml_gallocr_free_extra_space aims to free the extra space that is left when reusing a parent buffer with a larger alloc size than the node. This case occurs currently only with the Metal backend because we allocate additional size for some tensors:

static size_t ggml_backend_metal_buffer_type_get_alloc_size(ggml_backend_buffer_type_t buft, const ggml_tensor * tensor) {
size_t res = ggml_nbytes(tensor);
// some operations require additional memory for fleeting data:
switch (tensor->op) {
case GGML_OP_MUL_MAT_ID:
{
res += ggml_metal_op_mul_mat_id_extra_tpe(tensor);
res += ggml_metal_op_mul_mat_id_extra_ids(tensor);
} break;
case GGML_OP_FLASH_ATTN_EXT:
{
res += ggml_metal_op_flash_attn_ext_extra_pad(tensor);
res += ggml_metal_op_flash_attn_ext_extra_blk(tensor);
res += ggml_metal_op_flash_attn_ext_extra_tmp(tensor);
} break;
case GGML_OP_CUMSUM:
case GGML_OP_ARGSORT:
{
res *= 2;
} break;
case GGML_OP_TOP_K:
{
res = 2*sizeof(int32_t)*ggml_nelements(tensor->src[0]);
} break;
default:
break;
}
return res;
GGML_UNUSED(buft);
}

The implementation of ggml_gallocr_free_extra_space did not take into account that the passed extra_size to the ggml_dyn_tallocr_free_tensor call will be padded/aligned here:

// this is a very naive implementation, but for our case the number of free blocks should be very small
static void ggml_dyn_tallocr_free_tensor(struct ggml_dyn_tallocr * alloc, struct buffer_address addr, size_t size, const struct ggml_tensor * tensor) {
size = aligned_offset(NULL, size, alloc->alignment);

Depending on the specific tensor sizes / alignment, this could lead to corruption of the allocator chunks. The fix in this PR takes into account the alignment before computing the extra_size, which guarantees that after freeing it, the chunks will remain aligned.

Also:

  • Rename ggml_dyn_tallocr_free_tensor -> ggml_dyn_tallocr_free_bytes
  • Fix crash when building with GGML_ALLOCATOR_DEBUG and using the Metal backend. The problem was that ggml_dyn_tallocr_free_tensor assumed to always remove an allocated tensor and hence the remove_allocated_tensor() debug call in it. However, when we free extra bytes from a parent buffer, we are not actually removing a tensor - just the extra bytes. To fix that, we now call remove_allocated_tensor() only in ggml_gallocr_free_node()

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 9, 2025
@ggerganov ggerganov merged commit c6f6e4f into master Dec 11, 2025
74 of 78 checks passed
@ggerganov ggerganov deleted the gg/ggml-alloc-fix-misaligned-reuse branch December 11, 2025 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants