Google slashes Gemini 1.5 Flash prices, igniting LLM price war

In May, Google announced the new Gemini 1.5 Flash model, optimized for speed and efficiency. The Gemini 1.5 Flash was aggressively priced ($0.35 per million input tokens and $1.05 per million output tokens) compared to other frontier models, including Google's own Gemini 1.5 Pro. Last month, OpenAI announced the new GPT-4o Mini model to compete directly against Gemini 1.5 Flash, undercutting its pricing at $0.15 per million input tokens and $0.6 per million output tokens.

Today, Google reduced the price of the Gemini 1.5 Flash model by about 80%, effective August 12, 2024. The new cost will be $0.075 per million input tokens and $0.3 per million output tokens, making Gemini 1.5 Flash nearly 50% cheaper than OpenAI's GPT-4o mini. This reduced price along with features like context caching can significantly reduce the cost and latency of long context queries. Batch API calls can further reduce the costs for latency intensive tasks.

Regarding performance, Gemini 1.5 Flash still lags behind GPT-4o mini. As the table below shows, GPT-4o mini outperforms Gemini 1.5 Flash in all top AI benchmarks except MathVista.

While the price reduction is advantageous for developers and enterprises, it poses a significant challenge for smaller AI startups competing against industry giants like Google and OpenAI. Startups whose business models center on developing and serving LLMs via APIs may find it increasingly difficult to remain viable in the current price war. Additionally, the recent release of Meta's open-source LLama 3.1 frontier models further intensifies the competitive landscape. In this evolving market, startups will need to demonstrate significant innovation or differentiation to ensure their long-term sustainability.

Along with the price reduction, Google also announced that the Gemini 1.5 Flash model can now understand and respond in over 100 languages. Additionally, the general availability of provisioned throughput allows developers to scale their usage of models like Gemini 1.5 Flash, ensuring both capacity and price predictability.

Source: Google