Skip to content

Conversation

@julien-nc
Copy link
Member

This is going towards making it possible to have a voice interaction with the Assistant.
The Assistant chat interface will schedule such audio-to-audio chat tasks if Agency is not available.

This does not cover Agency for now.
Do we make a ContextAgentVoiceInteraction task type that takes voice as input?
We just need to support voice input and voice output. In between, the interactions between the agent and the model are text-based. So the Context Agent app would most likely still schedule TextToTextChatWithTools tasks.
We can also go the easy road and do the 3 steps when Agency is involved: STT -> Context Agent -> TTS. So we would not need any change in Context Agent and no new task type.
To be discussed.

@julien-nc julien-nc added this to the Nextcloud 32 milestone Jul 2, 2025
@julien-nc julien-nc requested a review from a team as a code owner July 2, 2025 10:31
@julien-nc julien-nc removed the request for review from a team July 2, 2025 10:31
'history' => new ShapeDescriptor(
$this->l->t('Chat history'),
$this->l->t('The history of chat messages before the current message, starting with a message by the user'),
EShapeType::ListOfTexts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we get the texts from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the chat conversation history in the chat UI.
Ideally we need a mixed list of text and audio but since we will get the transcription of audio responses (as optional output), my plan is to store that in a chat message so the history can be text-only for now. Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But if the transcription is an optional output, we may not get it when other providers implement this task type. Do we schedule another transcription task then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes.
I even think the text output could be mandatory and part of the task type output shape (because we need it for the history). Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking that as well, let's make it mandatory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input transcription is optional though. Or is it? We need it too for the history.

@julien-nc julien-nc force-pushed the enh/noid/taskpro-audio-chat branch from a549444 to b9c1fc0 Compare July 3, 2025 08:43
@julien-nc julien-nc force-pushed the enh/noid/taskpro-audio-chat branch from b9c1fc0 to cce12a9 Compare July 4, 2025 09:02
Copy link
Contributor

@kyteinsky kyteinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@julien-nc julien-nc force-pushed the enh/noid/taskpro-audio-chat branch from cce12a9 to af059cb Compare July 7, 2025 09:39
@julien-nc
Copy link
Member Author

Rebased on master.
Added input_transcript in the output shape (so it's mandatory).

@marcelklehr marcelklehr merged commit 58a3710 into master Jul 7, 2025
194 checks passed
@marcelklehr marcelklehr deleted the enh/noid/taskpro-audio-chat branch July 7, 2025 12:31
@skjnldsv skjnldsv mentioned this pull request Aug 19, 2025
@skjnldsv skjnldsv removed this from the Nextcloud 32 milestone Sep 28, 2025
@skjnldsv skjnldsv modified the milestones: Nextcloud 33, Nextcloud 32 Sep 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants