Microsoft shares more details on Maia 100, its first custom AI chip

During Ignite 2023, Microsoft first announced that it had developed its own AI accelerator chip called Maia. Earlier this year, during the Build developer conference, Microsoft shared more details on Azure Maia 100, its first in-house AI accelerator. Maia 100 is one of the largest processors made on TSMC's 5nm node, and it is designed specifically for large-scale AI workloads deployed in Azure.

Yesterday, at Hot Chips 2024, Microsoft shared specifications on Maia 100 for the first time and revealed a lot more details. You can find the summary of them below.

Maia 100 Specs:

Chip Size - 820mm2
Packaging - TSMC N5 process with COWOS-S interposer technology
HBM BW/Cap – 1.8TB/s @ 64GB HBM2E
Peak Dense Tensor POPS – 6bit: 3, 9bit: 1.5, BF16: 0.8
L1/L2 – 500MB
Backend Network BW – 600GB/s (12X400gbe)
Host BW (PCIe) = 32GB/s PCIe Gen5X8
Design to TDP – 700W
Provision TDP – 500W

The Microsoft Maia 100 system is vertically integrated to optimize cost and performance. It also features custom server boards with specially designed racks and a software stack to improve performance.

Maia 100 SoC Architecture:

A high-speed tensor unit to deliver high-speed processing for training and inferencing while supporting a wide range of data types. This tensor unit is constructed as a 16xRx16 unit.
The vector processor is a loosely coupled superscalar engine built with a custom instruction set architecture (ISA) to support a wide range of data types, including FP32 and BF16.
A Direct Memory Access (DMA) engine supports different tensor sharding schemes.
Hardware semaphores enable asynchronous programming on the Maia system.
To improve data utilization and power efficiency, large L1 and L2 scratch pads are software-managed.

Maia 100 uses Ethernet-based interconnect with a custom RoCE-like protocol for ultra-high bandwidth compute. It supports up to 4800 Gbps all-gather and scatter-reduced bandwidth, and 1200 Gbps all-to-all bandwidth.

On the software side, the Maia software development kit (SDK) allows anyone to quickly port their PyTorch and Triton models to Maia. The Maia SDK includes several components for developers, allowing them to easily deploy their models to Azure OpenAI Services.

Developers can choose from two programming models to program the Maia system. They can either use Triton, a popular open-source domain-specific language (DSL) for deep neural networks (DNNs), or the Maia API, a Maia-specific custom programming model built for maximum performance with more detailed control. Also, Maia has native support for PyTorch models allowing developers to execute PyTorch models with a single line change.

With its advanced architecture, great developer tools, and deep integration with Azure, the Maia 100 is changing the way Microsoft manages and executes AI workloads. It remains to be seen whether Microsoft will open up Maia 100 accelerators to 3rd party organizations like how Google has been doing with its TPUs and Amazon has been doing with its Trainium and Inferentia chips.

You can learn more about Maia 100 from Microsoft’s official blog post here.