Skip to content

Conversation

irbull
Copy link
Contributor

@irbull irbull commented Feb 10, 2024

There appears to be a known memory leak when using the MLTCommandBuffer. It is suggested to use @autoreleasepool in [1,2]

[1] https://developer.apple.com/forums/thread/662721
[2] https://forums.developer.apple.com/forums/thread/120931

This change-set wraps the ggml_metal_graph_compute in a @autoreleasepool.

This commit addresses #5436

@irbull
Copy link
Contributor Author

irbull commented Feb 10, 2024

It seems that the @autoreleasepool was removed recently [1], but from my testing on an M1, it seems that it is needed. The referenced links also seem to indicate that it's needed.

[1] #5007

@irbull
Copy link
Contributor Author

irbull commented Feb 10, 2024

To test this, I setup an test system that ran the same inference over the course of a few hours (several thousand inference requests). The memory of llama.cpp grew from 250Mb to over 1.5Gb. With this change-set applied, the memory remained constant at 250Mb on my system (as measured using top).

There appears to be a known memory leak when using the
`MLTCommandBuffer`. It is suggested to use `@autoreleasepool` in
[1,2]

[1] https://developer.apple.com/forums/thread/662721
[2] https://forums.developer.apple.com/forums/thread/120931

This change-set wraps the `ggml_metal_graph_compute` in a
`@autoreleasepool`.

This commit addresses ggml-org#5436
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this

@ggerganov ggerganov merged commit f026f81 into ggml-org:master Feb 10, 2024
@ptsochantaris
Copy link
Contributor

Nice catch - sorry for not noticing this when I made the original PR. I tested it for memory leaks after the change and didn't find any BUT that was using a UI app - if the app doesn't have a main runloop (i.e. CLI or non-UI app) and no global autorelease pool is set up, or you're running the code in a loop that never ends like in your test, the autoreleased command buffers will stay around waiting for a drain that will never come. That was a bit short sighted of me, sorry! Very glad you spotted and fixed that issue.

BTW - I've tried making versions of this file that use ARC so I could propose discarding all the memory management from 2010 :trollface: but unfortunately ARC does not play well with the struct fields that point to created objects (Clang in theory supports __strong references in structs, but somehow I never got it to work properly :)) The local structs could be replaced with ObjC objects but that would have a bad performance impact I suspect. Ideally this all should eventually become Swift but that's a discussion that's probably for a totally different time and place :-P

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
There appears to be a known memory leak when using the
`MLTCommandBuffer`. It is suggested to use `@autoreleasepool` in
[1,2]

[1] https://developer.apple.com/forums/thread/662721
[2] https://forums.developer.apple.com/forums/thread/120931

This change-set wraps the `ggml_metal_graph_compute` in a
`@autoreleasepool`.

This commit addresses ggml-org#5436
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
There appears to be a known memory leak when using the
`MLTCommandBuffer`. It is suggested to use `@autoreleasepool` in
[1,2]

[1] https://developer.apple.com/forums/thread/662721
[2] https://forums.developer.apple.com/forums/thread/120931

This change-set wraps the `ggml_metal_graph_compute` in a
`@autoreleasepool`.

This commit addresses ggml-org#5436
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants