When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Microsoft releases Phi-3.5 family of models that outperform competing models

Microsoft Phi-3

Today, Microsoft announced the release of the Phi-3.5 family of models, which includes Phi-3.5-vision, Phi-3.5-MoE, and Phi-3.5-mini. These lightweight models are built upon synthetic data and filtered publicly available websites and support a 128K token context length. All models are now available on Hugging Face under an MIT license.

Phi-3.5-MoE: A Mixture of Experts Breakthrough

The Phi-3.5-MoE stands out as the first model in the Phi family to leverage Mixture of Experts (MoE) technology. This 16 x 3.8B MoE model activates only 6.6B parameters with 2 experts and was trained on 4.9T tokens using 512 H100s. The Microsoft Research team designed the model from scratch to boost its performance. In standard AI benchmarks, Phi-3.5-MoE outperforms Llama-3.1 8B, Gemma-2-9B, and Gemini-1.5-Flash and is close to the current leader, GPT-4o-mini.

Phi-3.5-mini: Lightweight and Powerful

The Phi-3.5-mini is a 3.8B parameter model that surpasses Llama3.1 8B and Mistral 7B and is even competitive with Mistral NeMo 12B. It was trained on 3.4T tokens using 512 H100s. With just 3.8B active parameters, this model is competitive on multilingual tasks compared to LLMs with many more active parameters. Additionally, Phi-3.5-mini now supports 128K context length, while its main competitor, the Gemma-2 family, only supports 8K.

Phi-3.5-vision: Enhanced Multi-Frame Image Understanding

The Phi-3.5-vision is a 4.2B parameter model trained on 500B tokens using 256 A100 GPUs. This model now supports multi-frame image understanding and reasoning. Phi-3.5-vision has improved performance on MMMU (from 40.2 to 43.0), MMBench (from 80.5 to 81.9), and the document understanding benchmark TextVQA (from 70.9 to 72.0).

Microsoft is expected to share more details on the Phi-3.5 family of models later today. Microsoft's Phi-3.5 release showcases advancements in AI model efficiency and capabilities. With a focus on lightweight design and multi-modal understanding, the Phi-3.5 family of models may get broader adoption across various AI applications.

Report a problem with article
Intel Wi-Fi 23702 driver
Next Article

Intel releases new Wi-Fi driver with Wi-Fi 7 support for Windows 11 version 24H2

gamescom opening night live
Previous Article

Gamescom Opening Night Live 2024 Roundup: Diablo IV: Vessel of Hatred trailer and more

Join the conversation!

Login or Sign Up to read and post a comment.

1 Comment - Add comment