Since its release, the Gemini 1.5 Flash model has quickly become popular among developers due to its speed and cost-effectiveness. In August, Google announced Gemini 1.5 Flash 8B, a new experimental AI model that further reduces costs by using only 8 billion parameters.
After testing it with developers over the past few weeks, Google today announced the production-ready release of Gemini 1.5 Flash 8B, which can be used for high-volume multimodal use cases, long-context summarization tasks, and more. Compared to the original 1.5 Flash, the new 1.5 Flash 8B model costs 50% less, offers 2x higher rate limits, and comes with lower latency on small prompts.
In terms of performance, 1.5 Flash 8B nearly matches the performance of the 1.5 Flash model launched in May. It is particularly well-suited for tasks such as chat, transcription, and long-context language translation. A comparison of the new model"s benchmarks is available below.
The main highlight of this new 8B model is its cost. Gemini 1.5 Flash 8B is the cheapest AI model from Google to date. You can find the full pricing below:
- $0.0375 per 1 million tokens on prompts
- $0.15 per 1 million tokens on prompts
- $0.01 per 1 million tokens on cached prompts
To support high-volume AI applications, Google is also increasing the rate limit for this new model. Gemini 1.5 Flash 8B now supports 4,000 requests per minute (RPM), which is double the previous limit.
Interested developers can now try out the new gemini-1.5-flash-8b model for free via Google AI Studio and the Gemini API. For paid-tier developers, billing for this new model starts on Monday, October 14th.
With its impressive combination of performance, affordability, and accessibility, Google"s new Gemini 1.5 Flash 8B model is poised to become a popular AI model choice for developers across various domains.
Source: Google