Leaked benchmarks suggest Meta Llama 3.1 405B model may outperform OpenAI's GPT-4o

In April 2024, Meta launched Llama 3, its next generation of state-of-the-art open-source large language models. The first two models, Llama 3 8B and Llama 3 70B, set new benchmarks for LLMs of their size. However, in just three months, several other LLMs have surpassed their performance.

Meta has already revealed that its largest Llama 3 model will have over 400 billion parameters and is still in training. Today, the LocalLLaMA subreddit leaked early benchmarks of the upcoming Llama 3.1 8B, 70B, and 405B models. The leaked data suggests that Meta Llama 3.1 405B could outperform the current leader, OpenAI's GPT-4o, in several key AI benchmarks. This is a significant milestone for the open-source AI community, marking the first time an open-source model may beat the current state-of-the-art closed-source LLM model.

Meta said the following during the Llama 3 launch:

We’re committed to the continued growth and development of an open AI ecosystem for releasing our models responsibly. We have long believed that openness leads to better, safer products, faster innovation, and a healthier overall market. This is good for Meta, and it is good for society.

As shown in the benchmarks, Meta Llama 3.1 outperforms GPT-4 on several tests, including GSM8K, Hellaswag, boolq, MMLU-humanities, MMLU-other, MMLU-stem, and winograd, among others. However, it falls behind on HumanEval and MMLU-social sciences.

It's important to note that these numbers are from the base models of Llama 3.1. To fully unlock the potential of the model, instruction-tuning is important. Many of these results may improve with the release of the Instruct versions of Llama 3.1 models.

While OpenAI's upcoming GPT-5, with its anticipated advanced reasoning capabilities, may challenge Llama 3.1's potential leadership in the LLM space, Llama 3.1's strong performance against GPT-4o still highlights the power and potential of open-source AI development. This continued progress could democratize access to cutting-edge AI technology and accelerate innovation in the technology industry.

Source: Reddit