Yesterday, OpenAI announced its newest frontier model, officially named GPT-4o-2024-08-06. This new model comes with Structured Outputs support in the API, which ensures that model-generated outputs will exactly match JSON Schemas provided by developers. Along with the Structured Outputs support, OpenAI also reduced the API usage price. The new gpt-4o-2024-08-06 model now costs $2.50 per 1 million input tokens and $10.00 per 1 million output tokens. This updated price is 50% cheaper for input tokens and 33% cheaper for output tokens compared to gpt-4o-2024-05-13.
Today, Microsoft announced the availability of the GPT-4o-2024-08-06 model on Azure. Unfortunately, Microsoft did not reveal its pricing for this new model, but it will be updated soon.
Structured Outputs in OpenAI"s models will be available in two forms:
- User-defined JSON Schema: This option allows developers to specify the exact JSON Schema they want the AI to follow, supported by both GPT-4o-2024-08-06 and GPT-4o-mini-2024-07-18.
- More Accurate Tool Output ("Strict Mode"): This limited version lets developers define specific function signatures for tool use, supported by all models that support function calling, including GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, and GPT-4o models from June 2023 onwards.
Even though OpenAI didn"t mention performance improvements for the gpt-4o-2024-08-06 model, it seems to be performing better than gpt-4o-2024-05-13.
According to LiveBench, a benchmark for LLMs designed with test set contamination and objective evaluation in mind, the gpt-4o-2024-08-06 model scored an average of 56.71, up from the gpt-4o-2024-05-13 model"s 54.63. According to ZeroEval Leaderboard, gpt-4o-2024-08-06 tops the leaderboard with an average score of 88.5275. The gpt-4o-2024-08-06 model is not yet listed in LMSYS Chatbot Arena. Currently, Gemini-1.5-Pro-Exp-0801 is leading the leaderboard with a record 1299 Arena score.
With the launch of gpt-4o-2024-08-06 model, OpenAI continues to push the boundaries of AI model capabilities, offering developers more control and affordability. While early benchmarks suggest promising performance improvements, the true test of its capabilities will lie in real-world AI applications.