Apple used Google's TPUs to train its foundation language models

At WWDC 2024, Apple introduced Apple Intelligence, a set of AI features that will be deeply integrated into iOS, iPadOS, and macOS. Yesterday, Apple started the roll out of Apple Intelligence features in the new iOS 18.1 developer beta release.

Apple Intelligence is powered by Apple's generative models that are fine-tuned for user experiences, such as writing and refining text, prioritizing and summarizing notifications, and more. Now, Apple has published a technical paper on the foundation language models that were used to develop Apple Intelligence features.

In the technical report, Apple revealed that Apple Intelligence features are primarily built on two models: a 3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. Surprisingly, neither model was trained using the industry favorite NVIDIA H100s; instead, Apple used Google TPUs.

The Apple foundation models were pre-trained on v4 and v5p Cloud TPU clusters with the AXLearn framework and a JAX-based deep learning library. The AFM-server model was trained on 8192 TPUv4 chips provisioned as 8 × 1024 chip slices, where slices are connected together by the data-center network (DCN). The AFM-on-device model was trained on one slice of 2048 TPUv5p chips.

The Google Cloud TPU v4 was announced back in 2023 and offers nearly 10x ML system performance over TPU v3. Also, the v4 outperforms TPU v3 by 2.1x on average on a per-chip basis and improves performance/Watt by 2.7x.

The Google Cloud TPU v5p is the most powerful TPU to date. Each TPU v5p pod includes 8,960 chips over the highest-bandwidth inter-chip interconnect (ICI) at 4,800 Gbps/chip in a 3D torus topology. The TPU v5p features more than 2X greater FLOPS and 3X more high-bandwidth memory (HBM) when compared to the TPU v4. As a result, TPU v5p can train large LLM models 2.8X faster than the TPU v4.

Apple's decision to utilize Google TPUs instead of NVIDIA GPUs for its AI models could be attributed to various factors, including the limited availability of NVIDIA H100s and the potential cost-effectiveness of Google's offerings. Regardless of the exact reasoning, this strategic move serves as a significant marketing opportunity for Google Cloud to showcase the capabilities and advantages of its TPU infrastructure.

Source: Apple