Memory leak in Mac Metal ggml_metal_graph_compute

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

There appears to be a small memory leak in `ggml_metal_graph_compute`. After running a continual inference a few hundred times, I notice the amount of memory on my M1 constantly growing.

I've been tracking this for a while, and it appears to come from the decode function, and specifically in the `ggml_metal_graph_compute`.  I've removed the entire contents of the `dispatch_apply` and the memory still seems to be leaking.  There appears to be a few "known issues" around the MLTCommandBuffer leaking memory [1,2]

[1] https://developer.apple.com/forums/thread/662721
[2] https://forums.developer.apple.com/forums/thread/120931

There is a suggestion to use a `@autoreleasepool` when using the MLTCommandBuffer. After adding this, I can confirm that the memory usage of Llama.cpp stays stable even after 1,000 inference requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory leak in Mac Metal ggml_metal_graph_compute #5436

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory leak in Mac Metal ggml_metal_graph_compute #5436

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions