Google announces Gemini 2.0 Flash with native image and audio output

Google today kicked off its Gemini 2.0 era with the new Gemini 2.0 Flash model. Google claims that this new Gemini 2.0 Flash model even outperforms Gemini 1.5 Pro on key benchmarks and is 2X faster as well.

Apart from improved performance and low latency, Gemini 2.0 Flash also comes with native support for multimodal output, including natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It also supports multimodal inputs like images, video, and audio. This new model can also natively call tools, including Google Search, code execution, and others.

Developers can now try out the experimental version of the Gemini 2.0 Flash model in AI Studio and Vertex AI today. Developers can also try out the new Multimodal Live API that has real-time audio, video-streaming input, and the ability to use multiple, combined tools.

This new model is available for consumers via the Gemini experience on the desktop and mobile web, and it is coming soon to mobile apps as well. Google will announce the general availability of Gemini 2.0 Flash in January 2025.

Along with Gemini 2.0 Flash, Google also announced several prototypes that explore the agentic capabilities of Gemini 2.0.

Project Astra can now converse in multiple languages and mixed languages. It now has up to 10 minutes of in-session memory and can use Google Search, Lens, and Maps.

Project Mariner is an AI agent that can understand and reason across information on your browser screen to complete tasks. Google claims that Project Mariner achieved a state-of-the-art result of 83.5% working as a single agent setup.
Jules is an AI-powered code agent that integrates directly into a GitHub workflow to fix an issue, develop a plan, and execute it.

With its multimodal capabilities and native tool integration, Gemini 2.0 Flash opens up exciting possibilities for developers and consumers alike.