-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
[TaskProcessing] Add audio-to-audio chat task type #53759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 'history' => new ShapeDescriptor( | ||
| $this->l->t('Chat history'), | ||
| $this->l->t('The history of chat messages before the current message, starting with a message by the user'), | ||
| EShapeType::ListOfTexts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we get the texts from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the chat conversation history in the chat UI.
Ideally we need a mixed list of text and audio but since we will get the transcription of audio responses (as optional output), my plan is to store that in a chat message so the history can be text-only for now. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. But if the transcription is an optional output, we may not get it when other providers implement this task type. Do we schedule another transcription task then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes.
I even think the text output could be mandatory and part of the task type output shape (because we need it for the history). Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking that as well, let's make it mandatory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input transcription is optional though. Or is it? We need it too for the history.
a549444 to
b9c1fc0
Compare
b9c1fc0 to
cce12a9
Compare
kyteinsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Signed-off-by: Julien Veyssier <[email protected]>
cce12a9 to
af059cb
Compare
|
Rebased on master. |
This is going towards making it possible to have a voice interaction with the Assistant.
The Assistant chat interface will schedule such audio-to-audio chat tasks if Agency is not available.
This does not cover Agency for now.
Do we make a
ContextAgentVoiceInteractiontask type that takes voice as input?We just need to support voice input and voice output. In between, the interactions between the agent and the model are text-based. So the Context Agent app would most likely still schedule TextToTextChatWithTools tasks.
We can also go the easy road and do the 3 steps when Agency is involved: STT -> Context Agent -> TTS. So we would not need any change in Context Agent and no new task type.
To be discussed.