Skip to content

Conversation

@HSGamer
Copy link
Contributor

@HSGamer HSGamer commented Jun 25, 2025

Smolagents does have a method to dynamically handle types of tool result (https://github.com/huggingface/smolagents/blob/fc73322658a2c261cf59d817660c6c88d510431b/src/smolagents/agent_types.py#L262-L280). It supports Text as str, Image as PIL.Image and Audio as Tensor.

This PR modified the content to match the supported types.

This is useful for ToolCallingAgent since it supports all types of content. It's not useful for CodeAgent at the moment since it only supports Text, but this can be a preparation for when CodeAgent is upgraded to support all types.

@grll
Copy link
Owner

grll commented Jun 26, 2025

Hey @HSGamer thanks a lot for the contribution it's a very interesting feature that we wanted to add. I will have a look as soon as possible!

self.skip_forward_signature_validation = True

def forward(self, *args, **kwargs) -> str:
def forward(self, *args, **kwargs):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could type the return type here as image, audio or text

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Pillow and torchaudio are optional, I'm not sure that it won't throw error if the packages are not available

Copy link
Owner

@grll grll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think it's a very good change. I think it's great that it is using PIL and torch Audio as in Smolagents. Sorry for taking so long to review. Could you please add test and better documentation around the extra to import?

@HSGamer HSGamer requested a review from grll July 9, 2025 02:43
@grll
Copy link
Owner

grll commented Jul 13, 2025

@HSGamer we are almost there. Lint and test are failing though. Could we also maybe simplify the tests and not create the audio or the image everytime but instead maybe create it once and commit it as a file in tests/data for example? I think this could cut the code in the test by a significant amount. Thanks a lot for this!

@HSGamer
Copy link
Contributor Author

HSGamer commented Jul 13, 2025

@grll I think I fixed the failing tests, at least make lint and make tests work on my computer. About the files, I created the sample files and added a module to load them. Can you check again?

@grll
Copy link
Owner

grll commented Jul 13, 2025

@grll I think I fixed the failing tests, at least make lint and make tests work on my computer. About the files, I created the sample files and added a module to load them. Can you check again?

thanks for the changes! for some reason the tests still fail in CI, I will have a look tomorrow

Copy link
Owner

@grll grll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed the typing and test failing. Many thanks for bringing this feature into mcpadapt @HSGamer

@grll grll merged commit a75387f into grll:main Jul 14, 2025
3 checks passed
amithkk pushed a commit to amithkk/mcpadapt that referenced this pull request Sep 6, 2025
* Smolagents: support ImageContent and AudioContent

* update uv lock

* add audio to test

* change the command in audio

* add a note about audio package in smolagents docs

* add test_image

* add test_audio

* add sample files

* use pytest-datadir to get the sample files

* assert to make sure the right image size

* Any result-type

* add an audio backend via soundfile package to make tests work

* fix missing pytest fixture for the datadir

* fix mypy warning no stubs

* fix format

* improve typing of the forward function

* fix type definition

---------

Co-authored-by: Guillaume Raille <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants