In June, Google first announced Gemma 2, the next generation of open language models built on a new architecture designed for high performance and efficiency. Until yesterday, Gemma 2 was available in two sizes: 9 billion (9B) and 27 billion (27B) parameters. Yesterday, Google expanded the Gemma 2 family by announcing Gemma 2 2B with 2 billion parameters.
The new Gemma 2 2B claims to deliver best-in-class performance, even beating all GPT-3.5 models on the Chatbot Arena with a score of 1126. It can also run efficiently on various hardware, ranging from PCs and edge devices to cloud deployments on Google Cloud Vertex AI. Google has optimized the model with the NVIDIA TensorRT-LLM library, and developers can use it as an NVIDIA NIM (Nvidia Inference Microservices). Since it"s optimized for the NVIDIA TensorRT-LLM library, it can run on various platforms using NVIDIA RTX, NVIDIA GeForce RTX GPUs, and NVIDIA Jetson modules. Additionally, Gemma 2 2B integrates with Keras, JAX, Hugging Face, NVIDIA NeMo, Ollama, Gemma.cpp, and soon MediaPipe for easy development.
Gemma 2"s model weights are available for download from Kaggle, Hugging Face, and the Vertex AI Model Garden. Unlike the Google Gemini models, Gemma 2 is available under commercially-friendly licensing. Along with Gemma 2, Google also announced ShieldGemma safety content classifier models and Gemma Scope model interpretability tool.
In April, Microsoft announced the Phi-3 family of language models, which compete directly against Google"s Gemma family of models. The Phi-3 family has three models: The Phi-3-mini is a 3.8B language model available in two context-length variants, 4K and 128K tokens. The Phi-3-Small is a 7B language model available in two context-length variants, 8K and 128K tokens. The Phi-3-medium is a 14B language model, also available in the same two context-length variants.
The emergence of smaller, yet powerful, language models like Google"s Gemini 2 2B and Microsoft"s Phi-3 family signifies a growing trend in the AI industry. This shift towards smaller models prioritizes accessibility and efficiency, enabling deployment on a wider range of devices and lowering computational costs.
Source: Google