Cerebras launches the world's fastest AI inference, 20X performance compared to NVIDIA

Cerebras Systems announced Cerebras Inference today as the world's fastest AI inference solution. This new solution delivers 1,800 tokens per second for Llama 3.1 8B and 450 tokens per second for Llama 3.1 70B, almost 20 times faster than NVIDIA GPU-based AI inference solutions available in hyper-scale clouds, including Microsoft Azure.

In addition to its incredible performance, the pricing of this new inference solution is a fraction of popular GPU clouds. For example, you can get one million tokens at just 10 cents, thus providing 100x higher price-performance for AI workloads.

Cerebras' 16-bit accuracy and 20x faster inference calls will allow AI app developers to build next-generation AI applications without compromising on speed or cost. This groundbreaking price-performance ratio is made possible by the Cerebras CS-3 system and its Wafer Scale Engine 3 (WSE-3) AI processor. The CS-3 has 7,000x more memory bandwidth than the Nvidia H100, solving Generative AI's memory bandwidth technical challenge.

Cerebras Inference is available in the following three tiers:

The Free Tier offers free API access and generous usage limits to anyone who logs in.
The Developer Tier, designed for flexible, serverless deployment, provides users with an API endpoint at a fraction of the cost of alternatives in the market, with Llama 3.1 8B and 70B models priced at 10 cents and 60 cents per million tokens, respectively.
The Enterprise Tier offers fine-tuned models, custom service level agreements, and dedicated support. Ideal for sustained workloads, enterprises can access Cerebras Inference via a Cerebras-managed private cloud or on customer premises.

The Cerebras team said the following regarding the Cerebras Inference launch:

"With record-breaking performance, industry-leading pricing, and open API access, Cerebras Inference sets a new standard for open LLM development and deployment. As the only solution capable of delivering both high-speed training and inference, Cerebras opens entirely new capabilities for AI."

The AI landscape is evolving rapidly, and while NVIDIA currently holds a commanding position in the AI market, the emergence of companies like Cerebras and Groq signals a potential shift in the industry dynamics. As the demand for faster and more cost-effective AI inference solutions intensifies, these challengers are well-positioned to disrupt NVIDIA's dominance, particularly in the inference domain.