Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi llama.cpp team,
Greetings from the MiniCPM-V team! As presented in our recent Nature Communications paper, "Efficient GPT-4V level multimodal large language model for deployment on edge devices", our mission has always been to empower the open-source community with highly efficient, edge-deployabl models.
In this pull request, I'm contributing support for MiniCPM-V 4.0, a multimodal model specifically designed for phone-sized devices.
As part of our effort to make this model broadly accessible, we plan to open-source the following three components:
Adaptation of MiniCPM-V 4.0 to llama.cpp
Apple NPU acceleration integrated into llama.cpp — to take full advantage of Apple's on-device hardware on macOS, iPadOS, and iOS.
A reference app demo built on top of the above adaptations — demonstrating how to deploy and run the multimodal model seamlessly on Apple devices. This app was recently showcased at the WAIC conference.
With these contributions, we hope to enable the community to run fast, efficient multimodal models across Mac/iPad/iPhone devices, and to customize or extend the codebase as needed.
This initial PR includes only the model integration and introduces minimal changes. I hope it can be reviewed and merged quickly. The NPU acceleration PR will follow shortly. Since it may involve more complex discussion around API design and integration, I would really appreciate the support and feedback from the llama.cpp community during that process.
Below is a GIF recording of our actual demo running entirely on an iPhone in airplane mode, showcasing the fully on-device deployment in action.
Looking forward to your review and collaboration!
Best regards,
MiniCPM-V team