Google's Gemma 2 delivers powerhouse performance at a fraction of the cost

In addition to its Gemini family of AI models, Google also offers Gemma, a family of lightweight open models. Today, they announced Gemma 2, the next generation built on a new architecture designed for breakthrough performance and efficiency.

Gemma 2 is available in two sizes: 9 billion (9B) and 27 billion (27B) parameters. As expected, this new generation is more efficient at inference and delivers better performance than the first Gemma model. Google claims the 27B model delivers performance comparable to models more than twice its size, while the 9B model outperforms Llama 3 8B and other similarly sized open-source models. In the coming months, Google has plans to release a 2.6B parameter Gemma 2 model that will be more suitable for smartphone AI scenarios.

The new Gemma 2 model can be hosted on a single NVIDIA A100 80GB Tensor Core GPU, NVIDIA H100 Tensor Core GPU, or a single TPU host, reducing AI infrastructure costs. You can even run Gemma 2 on NVIDIA RTX or GeForce RTX desktop GPUs via Hugging Face Transformers. Starting next month, Google Cloud customers can deploy and manage Gemma 2 on Vertex AI. Developers can now try the new Gemma 2 model on Google AI Studio.

During the training of Gemma 2, Google filtered pre-training data and performed testing and evaluation against a comprehensive set of safety metrics to identify and mitigate potential biases and risks.

Google is making Gemma 2 available free of charge through Kaggle or a Colab free tier. Academic researchers can apply for the Gemma 2 Academic Research Program to receive Google Cloud credits.

The combination of high performance, efficiency, and accessibility makes Gemma 2 a game-changer in the open-source AI landscape. Google's commitment to open access and responsible AI development sets a positive example for the future of artificial intelligence.

Source: Google