Skip to content

Memory leak in Mac Metal ggml_metal_graph_compute #5436

@irbull

Description

@irbull

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

There appears to be a small memory leak in ggml_metal_graph_compute. After running a continual inference a few hundred times, I notice the amount of memory on my M1 constantly growing.

I've been tracking this for a while, and it appears to come from the decode function, and specifically in the ggml_metal_graph_compute. I've removed the entire contents of the dispatch_apply and the memory still seems to be leaking. There appears to be a few "known issues" around the MLTCommandBuffer leaking memory [1,2]

[1] https://developer.apple.com/forums/thread/662721
[2] https://forums.developer.apple.com/forums/thread/120931

There is a suggestion to use a @autoreleasepool when using the MLTCommandBuffer. After adding this, I can confirm that the memory usage of Llama.cpp stays stable even after 1,000 inference requests

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions