During Ignite 2023, Microsoft first announced that it had developed its own AI accelerator chip called Maia. Earlier this year, during the Build developer conference, Microsoft shared more details on Azure Maia 100, its first in-house AI accelerator. Maia 100 is one of the largest processors made on TSMC"s 5nm node, and it is designed specifically for large-scale AI workloads deployed in Azure.
Yesterday, at Hot Chips 2024, Microsoft shared specifications on Maia 100 for the first time and revealed a lot more details. You can find the summary of them below.
Maia 100 Specs:
- Chip Size - 820mm2
- Packaging - TSMC N5 process with COWOS-S interposer technology
- HBM BW/Cap – 1.8TB/s @ 64GB HBM2E
- Peak Dense Tensor POPS – 6bit: 3, 9bit: 1.5, BF16: 0.8
- L1/L2 – 500MB
- Backend Network BW – 600GB/s (12X400gbe)
- Host BW (PCIe) = 32GB/s PCIe Gen5X8
- Design to TDP – 700W
- Provision TDP – 500W
The Microsoft Maia 100 system is vertically integrated to optimize cost and performance. It also features custom server boards with specially designed racks and a software stack to improve performance.
Maia 100 SoC Architecture:
- A high-speed tensor unit to deliver high-speed processing for training and inferencing while supporting a wide range of data types. This tensor unit is constructed as a 16xRx16 unit.
- The vector processor is a loosely coupled superscalar engine built with a custom instruction set architecture (ISA) to support a wide range of data types, including FP32 and BF16.
- A Direct Memory Access (DMA) engine supports different tensor sharding schemes.
- Hardware semaphores enable asynchronous programming on the Maia system.
- To improve data utilization and power efficiency, large L1 and L2 scratch pads are software-managed.
Maia 100 uses Ethernet-based interconnect with a custom RoCE-like protocol for ultra-high bandwidth compute. It supports up to 4800 Gbps all-gather and scatter-reduced bandwidth, and 1200 Gbps all-to-all bandwidth.
On the software side, the Maia software development kit (SDK) allows anyone to quickly port their PyTorch and Triton models to Maia. The Maia SDK includes several components for developers, allowing them to easily deploy their models to Azure OpenAI Services.
Developers can choose from two programming models to program the Maia system. They can either use Triton, a popular open-source domain-specific language (DSL) for deep neural networks (DNNs), or the Maia API, a Maia-specific custom programming model built for maximum performance with more detailed control. Also, Maia has native support for PyTorch models allowing developers to execute PyTorch models with a single line change.
With its advanced architecture, great developer tools, and deep integration with Azure, the Maia 100 is changing the way Microsoft manages and executes AI workloads. It remains to be seen whether Microsoft will open up Maia 100 accelerators to 3rd party organizations like how Google has been doing with its TPUs and Amazon has been doing with its Trainium and Inferentia chips.
You can learn more about Maia 100 from Microsoft’s official blog post here.