-
Notifications
You must be signed in to change notification settings - Fork 47
Smolagents: support ImageContent and AudioContent #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hey @HSGamer thanks a lot for the contribution it's a very interesting feature that we wanted to add. I will have a look as soon as possible! |
# Conflicts: # uv.lock
src/mcpadapt/smolagents_adapter.py
Outdated
| self.skip_forward_signature_validation = True | ||
|
|
||
| def forward(self, *args, **kwargs) -> str: | ||
| def forward(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could type the return type here as image, audio or text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Pillow and torchaudio are optional, I'm not sure that it won't throw error if the packages are not available
grll
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think it's a very good change. I think it's great that it is using PIL and torch Audio as in Smolagents. Sorry for taking so long to review. Could you please add test and better documentation around the extra to import?
|
@HSGamer we are almost there. Lint and test are failing though. Could we also maybe simplify the tests and not create the audio or the image everytime but instead maybe create it once and commit it as a file in |
|
@grll I think I fixed the failing tests, at least |
thanks for the changes! for some reason the tests still fail in CI, I will have a look tomorrow |
grll
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed the typing and test failing. Many thanks for bringing this feature into mcpadapt @HSGamer
* Smolagents: support ImageContent and AudioContent * update uv lock * add audio to test * change the command in audio * add a note about audio package in smolagents docs * add test_image * add test_audio * add sample files * use pytest-datadir to get the sample files * assert to make sure the right image size * Any result-type * add an audio backend via soundfile package to make tests work * fix missing pytest fixture for the datadir * fix mypy warning no stubs * fix format * improve typing of the forward function * fix type definition --------- Co-authored-by: Guillaume Raille <[email protected]>
Smolagents does have a method to dynamically handle types of tool result (https://github.com/huggingface/smolagents/blob/fc73322658a2c261cf59d817660c6c88d510431b/src/smolagents/agent_types.py#L262-L280). It supports Text as
str, Image asPIL.Imageand Audio asTensor.This PR modified the content to match the supported types.
This is useful for ToolCallingAgent since it supports all types of content. It's not useful for CodeAgent at the moment since it only supports Text, but this can be a preparation for when CodeAgent is upgraded to support all types.