ggml-alloc : fix reuse-parent logic for misaligned sizes #17884

ggerganov · 2025-12-09T09:28:57Z

I encountered a bug in the logic for parent buffer reuse which occurs when the alloc size of the node and the parent are not aligned to the buffer alignment.

The logic in ggml_gallocr_free_extra_space aims to free the extra space that is left when reusing a parent buffer with a larger alloc size than the node. This case occurs currently only with the Metal backend because we allocate additional size for some tensors:

llama.cpp/ggml/src/ggml-metal/ggml-metal.cpp

Lines 183 to 217 in 0cdce38

    
           static size_t ggml_backend_metal_buffer_type_get_alloc_size(ggml_backend_buffer_type_t buft, const ggml_tensor * tensor) { 
        
               size_t res = ggml_nbytes(tensor); 
        
               // some operations require additional memory for fleeting data: 
        
               switch (tensor->op) { 
        
                   case GGML_OP_MUL_MAT_ID: 
        
                       { 
        
                           res += ggml_metal_op_mul_mat_id_extra_tpe(tensor); 
        
                           res += ggml_metal_op_mul_mat_id_extra_ids(tensor); 
        
                       } break; 
        
                   case GGML_OP_FLASH_ATTN_EXT: 
        
                       { 
        
                           res += ggml_metal_op_flash_attn_ext_extra_pad(tensor); 
        
                           res += ggml_metal_op_flash_attn_ext_extra_blk(tensor); 
        
                           res += ggml_metal_op_flash_attn_ext_extra_tmp(tensor); 
        
                       } break; 
        
                   case GGML_OP_CUMSUM: 
        
                   case GGML_OP_ARGSORT: 
        
                       { 
        
                           res *= 2; 
        
                       } break; 
        
                   case GGML_OP_TOP_K: 
        
                       { 
        
                           res = 2*sizeof(int32_t)*ggml_nelements(tensor->src[0]); 
        
                       } break; 
        
                   default: 
        
                       break; 
        
               } 
        
               return res; 
        
               GGML_UNUSED(buft); 
        
           }

The implementation of ggml_gallocr_free_extra_space did not take into account that the passed extra_size to the ggml_dyn_tallocr_free_tensor call will be padded/aligned here:

llama.cpp/ggml/src/ggml-alloc.c

Lines 312 to 316 in 0cdce38

    
           // this is a very naive implementation, but for our case the number of free blocks should be very small 
        
           static void ggml_dyn_tallocr_free_tensor(struct ggml_dyn_tallocr * alloc, struct buffer_address addr, size_t size, const struct ggml_tensor * tensor) { 
        
               size = aligned_offset(NULL, size, alloc->alignment);

Depending on the specific tensor sizes / alignment, this could lead to corruption of the allocator chunks. The fix in this PR takes into account the alignment before computing the extra_size, which guarantees that after freeing it, the chunks will remain aligned.

Also:

Rename ggml_dyn_tallocr_free_tensor -> ggml_dyn_tallocr_free_bytes
Fix crash when building with GGML_ALLOCATOR_DEBUG and using the Metal backend. The problem was that ggml_dyn_tallocr_free_tensor assumed to always remove an allocated tensor and hence the remove_allocated_tensor() debug call in it. However, when we free extra bytes from a parent buffer, we are not actually removing a tensor - just the extra bytes. To fix that, we now call remove_allocated_tensor() only in ggml_gallocr_free_node()

)

ggml-alloc : fix reuse-parent logic for misaligned sizes

0812b55

loci-dev mentioned this pull request Dec 9, 2025

UPSTREAM PR #17884: ggml-alloc : fix reuse-parent logic for misaligned sizes auroralabs-loci/llama.cpp#498

Open

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 9, 2025

ggerganov merged commit c6f6e4f into master Dec 11, 2025
74 of 78 checks passed

ggerganov deleted the gg/ggml-alloc-fix-misaligned-reuse branch December 11, 2025 12:30

Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Dec 12, 2025

ggml-alloc : fix reuse-parent logic for misaligned sizes (ggml-org#17884

e08cb85

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-alloc : fix reuse-parent logic for misaligned sizes #17884

ggml-alloc : fix reuse-parent logic for misaligned sizes #17884

ggerganov commented Dec 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


	static size_t ggml_backend_metal_buffer_type_get_alloc_size(ggml_backend_buffer_type_t buft, const ggml_tensor * tensor) {
	size_t res = ggml_nbytes(tensor);

	// some operations require additional memory for fleeting data:
	switch (tensor->op) {
	case GGML_OP_MUL_MAT_ID:
	{
	res += ggml_metal_op_mul_mat_id_extra_tpe(tensor);
	res += ggml_metal_op_mul_mat_id_extra_ids(tensor);
	} break;
	case GGML_OP_FLASH_ATTN_EXT:
	{
	res += ggml_metal_op_flash_attn_ext_extra_pad(tensor);
	res += ggml_metal_op_flash_attn_ext_extra_blk(tensor);
	res += ggml_metal_op_flash_attn_ext_extra_tmp(tensor);
	} break;
	case GGML_OP_CUMSUM:
	case GGML_OP_ARGSORT:
	{
	res *= 2;
	} break;
	case GGML_OP_TOP_K:
	{
	res = 2sizeof(int32_t)ggml_nelements(tensor->src[0]);
	} break;
	default:
	break;
	}

	return res;

	GGML_UNUSED(buft);
	}


	// this is a very naive implementation, but for our case the number of free blocks should be very small
	static void ggml_dyn_tallocr_free_tensor(struct ggml_dyn_tallocr * alloc, struct buffer_address addr, size_t size, const struct ggml_tensor * tensor) {
	size = aligned_offset(NULL, size, alloc->alignment);

ggml-alloc : fix reuse-parent logic for misaligned sizes #17884

ggml-alloc : fix reuse-parent logic for misaligned sizes #17884

Conversation

ggerganov commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov commented Dec 9, 2025 •

edited

Loading