OpenAI launches fine-tuning support for GPT-4o and GPT-4o mini

Early this year, OpenAI announced GPT-4o, its most advanced multimodal model, the GPT-4o. It is faster and cheaper than the GPT-4 Turbo and also delivers stronger vision capabilities. Last month, OpenAI announced its cost-effective AI model named GPT-4o mini. Today, OpenAI launched the ability to fine-tune GPT-4o and GPT-4o mini, one of the most requested features from developers.

With fine-tuning support, developers will be able to train GPT-4o and GPT-4o mini models using custom datasets to achieve higher performance at a lower cost for specific use cases. Fine-tuning allows developers to change the tone of the model's responses or even train the model to follow complex domain-specific instructions.

GPT-4o fine-tuning and GPT-4o mini fine-tuning are available today to all developers on all paid usage tiers. OpenAI is offering 1M training tokens per day for free for every organization for GPT-4o fine-tuning and 2M training tokens per day for free for GPT-4o mini. This offer will be available through September 23. GPT-4o fine-tuning training will cost $25 per million tokens, and inference will cost $3.75 per million input tokens and $15 per million output tokens.

To get started, go to the fine-tuning dashboard on the OpenAI platform portal, click create, and select gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18 from the base model drop-down.

OpenAI also highlighted that select partners who tested the fine-tuning of the GPT-4o model saw strong results. A couple of them are listed below.

Distyl recently placed 1st on the BIRD-SQL benchmark, the leading text-to-SQL benchmark. Distyl’s fine-tuned GPT-4o achieved an execution accuracy of 71.83% on the leaderboard and excelled across tasks like query reformulation, intent classification, chain-of-thought, and self-correction, with particularly high performance in SQL generation.

Using the fine-tuned GPT-4o model, Genie achieved a SOTA score of 43.8% on the new SWE-bench Verified benchmark. Genie also holds a SOTA score of 30.08% on SWE-bench Full, beating its previous SOTA score of 19.27%, the largest ever improvement in this benchmark.

You can learn more about fine-tuning of OpenAI models here.