Meta's new multimodal Llama 3.2 models now available on Microsoft Azure and Google Cloud

At Connect 2024, Meta Founder and CEO Mark Zuckerberg announced the launch of Llama 3.2. This release includes small and medium-sized vision LLMs (11B and 90B parameters) and a couple of on-device text-only models (1B and 3B parameters). The new 11B and 90B vision models are Llama’s first-ever multi-modal models.

Microsoft has now announced that the Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct models are now available on the Azure AI Model Catalog. Additionally, inferencing through Models-as-a-Service serverless APIs is coming soon for these new models.

The following Llama 3.2 models are available via managed compute inferencing on Azure:

Llama 3.2 1B

Llama 3.2 3B

Llama 3.2-1B-Instruct

Llama 3.2-3B-Instruct

Llama Guard 3 1B

Llama 3.2 11B Vision Instruct

Llama 3.2 90B Vision Instruct

Llama Guard 3 11B Vision

Fine-tuning is currently only available for Llama 3.2 1B Instruct and 3B Instruct. However, Microsoft will be bringing it to other Llama 3.2 model collections in the coming months. These models come with a 200k tokens per minute and 1k requests per minute limit. If developers need a higher rate limit, they can contact the Microsoft team for a further increase.

Google also announced that Llama 3.2 models are now available on Vertex AI Model Garden. All four Llama 3.2 models are ready for self-service deployment through Vertex AI Model Garden. However, only the Llama 3.2 90B model is currently available in preview through Google's Model-as-a-Service (MaaS) offering.

Along with the Llama 3.2 models, Meta also announced the release of Llama Stack distributions. These distributions will simplify the way developers use Llama models in different environments, including single-node, on-premises, cloud, and on-device. The Meta team released the following:

Llama CLI (command-line interface) to build, configure, and run Llama Stack distributions

Client code in multiple languages, including Python, Node.js, Kotlin, and Swift

Docker containers for Llama Stack Distribution Server and Agents API Provider

Multiple distributions:

Single-node Llama Stack Distribution via Meta internal implementation and Ollama

Cloud Llama Stack distributions via AWS, Databricks, Fireworks, and Together

On-device Llama Stack Distribution on iOS implemented via PyTorch ExecuTorch

On-premises Llama Stack Distribution supported by Dell

The release of Llama 3.2 models and Llama Stack distributions marks a significant step in making powerful AI models more accessible to developers. This will likely lead to increased innovation and wider adoption of AI across different industries.