Today, Microsoft announced the release of the Phi-3.5 family of models, which includes Phi-3.5-vision, Phi-3.5-MoE, and Phi-3.5-mini. These lightweight models are built upon synthetic data and filtered publicly available websites and support a 128K token context length. All models are now available on Hugging Face under an MIT license.
Phi-3.5-MoE: A Mixture of Experts Breakthrough
The Phi-3.5-MoE stands out as the first model in the Phi family to leverage Mixture of Experts (MoE) technology. This 16 x 3.8B MoE model activates only 6.6B parameters with 2 experts and was trained on 4.9T tokens using 512 H100s. The Microsoft Research team designed the model from scratch to boost its performance. In standard AI benchmarks, Phi-3.5-MoE outperforms Llama-3.1 8B, Gemma-2-9B, and Gemini-1.5-Flash and is close to the current leader, GPT-4o-mini.
Phi-3.5-mini: Lightweight and Powerful
The Phi-3.5-mini is a 3.8B parameter model that surpasses Llama3.1 8B and Mistral 7B and is even competitive with Mistral NeMo 12B. It was trained on 3.4T tokens using 512 H100s. With just 3.8B active parameters, this model is competitive on multilingual tasks compared to LLMs with many more active parameters. Additionally, Phi-3.5-mini now supports 128K context length, while its main competitor, the Gemma-2 family, only supports 8K.
Phi-3.5-vision: Enhanced Multi-Frame Image Understanding
The Phi-3.5-vision is a 4.2B parameter model trained on 500B tokens using 256 A100 GPUs. This model now supports multi-frame image understanding and reasoning. Phi-3.5-vision has improved performance on MMMU (from 40.2 to 43.0), MMBench (from 80.5 to 81.9), and the document understanding benchmark TextVQA (from 70.9 to 72.0).
Microsoft is expected to share more details on the Phi-3.5 family of models later today. Microsoft"s Phi-3.5 release showcases advancements in AI model efficiency and capabilities. With a focus on lightweight design and multi-modal understanding, the Phi-3.5 family of models may get broader adoption across various AI applications.