Back in September, OpenAI announced the new o1 series of LLMs that are designed to spend more time thinking before they respond. These models are suitable for complex reasoning tasks, and they perform better in science, coding, and math.
Today, Google announced its first reasoning-focused large language model called the Gemini 2.0 Flash Thinking. This new experimental model is available under the name gemini-2.0-flash-thinking-exp-1219 for developers in the Google AI Studio. Google claims that this latest model is best suited for multimodal understanding, reasoning, and coding.
Google mentioned that it saw promising results when it increased inference time computation. Unfortunately, Google did not share any benchmarks on its own to back its claim. But according to Chatbot Arena, Gemini-2.0-Flash-Thinking is now ranked No.1 across all categories.
Breaking news from Chatbot Arena⚡🤔@GoogleDeepMind's Gemini-2.0-Flash-Thinking debuts as #1 across ALL categories!
— lmarena.ai (formerly lmsys.org) (@lmarena_ai) December 19, 2024
The leap from Gemini-2.0-Flash:
- Overall: #3 → #1
- Overall (Style Control): #4 → #1
- Math: #2 → #1
- Creative Writing: #2 → #1
- Hard Prompts: #1 → #1… https://t.co/lO1DiTiOOj pic.twitter.com/cq2MRMbWZ1
Google listed the following use cases in its developer portal to try out the Gemini 2.0 Flash Thinking model:
- Reason over the most complex problems
- Show the thinking process of the model
- Tackle difficult code and math problems
This new model will support a context length greater than 128k, and it comes with a knowledge cut-off of August 2024. Developers can access this new Gemini reasoning model via the Gemini API in Google AI Studio and Vertex AI.
Want to see Gemini 2.0 Flash Thinking in action? Check out this demo where the model solves a physics problem and explains its reasoning. pic.twitter.com/Nl0hYj7ZFS
— Jeff Dean (@JeffDean) December 19, 2024
Early this week, OpenAI announced that its o1 reasoning model is rolling out to developers on usage tier 5 in the API. This updated o1 model delivers state-of-the-art results on several popular AI benchmarks. Developers can use the o1 model to build agentic applications to improve customer support, optimize supply chain decisions, and forecast financial trends.
With these new reasoning-focused LLMs, developers have even more powerful models to build innovative AI applications across various industries.
1 Comment - Add comment