Google today announced Gemini 3.1 Flash Live as its highest-quality audio and voice model yet to power big upgrades to Gemini Live and Search Live.
Available in preview via the Gemini Live API in Google AI Studio today, the model is more effective at recognizing acoustic nuances like pitch and pace and offers lower latency compared to 2.5 Flash Native Audio.
Key improvements include:
- Better speech recognition: More effective at discerning relevant speech from environmental sounds like traffic or television
- Background noise filtering: Improved ability to filter out background noise during conversations
- Tool triggering: Significantly improved ability to trigger external tools and deliver information during live conversations
- Instruction following: Better adherence to complex system instructions, keeping agents within operational guardrails
Global Search Live Expansion
Google is also using Gemini 3.1 Flash Live to roll out Search Live globally in over 200 countries. This includes audio and video capabilities (via Google Lens) for back-and-forth conversations with Google Search.
In Gemini Live on Android and iOS, 3.1 Flash Live delivers faster responses with fewer awkward pauses and can follow conversation threads twice as long. The model dynamically adjusts answer length and tone to match the context.
Multimodal Conversations
The model supports over 90 languages for real-time multimodal conversations, making it one of the most versatile voice AI models available. This positions Google to compete directly with OpenAI voice capabilities and other real-time audio models.
For developers building AI agents with voice capabilities, this represents a significant step forward in natural language interaction. The reduced latency and improved context handling make it suitable for complex multi-turn conversations.
Call to Action
Building voice-enabled AI agents? OpenClawHosting offers managed infrastructure for deploying AI agents with monitoring and scaling built in.