-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Added support for gpt4o-realtime models for Speect to Speech interactions #659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ions - Added detailed documentation for the new `RealtimeVoicePipeline`, including usage examples and event handling for real-time audio interaction. - Introduced a new example script demonstrating the `RealtimeVoicePipeline` with continuous audio streaming and tool execution.
8bcb389
to
b8899f7
Compare
Thank you so much for the PR @sharananurag998! I'll try to look at the PR later this week. Thank you for your patience |
I haven't found a way for native speech-to-speech integration with an agent, but we can define an agent and use it as a tool in the real-time speech pipeline, and it works! The agent-as-tool approach provides better latency than the STT-TTS-based VoicePipeline. Also this branch has Juspay specific MCP tool handling changes since we're using the fork as a python dependency, I'll move it to a separate branch so that main can be merged. @dkundel-openai you can review the new pipeline and let me know of any changes I'll be happy to work on it. |
Hi everyone, any news on this pull request or in general timeline to integrate the realtime api? i'm very much interested in using it with the SDK agent and i was wondering if to write my own code or to wait it to be directly integrated. Thanks! |
This PR is stale because it has been open for 10 days with no activity. |
I second this! Would love to have realtime STS support in this SDK. |
This PR is stale because it has been open for 10 days with no activity. |
No news but I think on top of that the new Voice Live API has been released (both still in beta) so i don't think they Will integrate unless they released these API in a stable version |
Darn, I was waiting on exactly this. Does it also do handoffs or do you have to use agent.as_tool()? |
This PR introduces real-time voice pipeline support for OpenAI’s
gpt-4o-realtime-preview
model, enabling seamless, low-latency speech-to-speech interactions in the Speect framework. The update brings a modern, streaming audio interface, integrated tool execution, and robust event handling—while maintaining full compatibility with the existing STT/TTS pipeline.Key Features & Changes
RealtimeVoicePipeline:
Integrated Tool Calls:
Event Handling & Debugging:
conversation.item.input_audio_transcription.delta
and.completed
)Echo & Feedback Mitigation:
input_audio_noise_reduction
in the session config.Sample Rate Fixes:
Backwards Compatibility:
Documentation & Examples:
docs/voice/pipeline.md
with new real-time usage, configuration, and troubleshooting sections.continuous_realtime_assistant.py
demonstrates push-to-talk, tool calls, and event handling.🛠️ How to Use
See the new example and documentation for how to use
RealtimeVoicePipeline
with your OpenAI API key and tools.No changes required—existing STT/TTS flows are unaffected.