Intel pushes PyTorch 2.5 forward with expanded Intel GPU support

Intel today announced its contributions to the recently released PyTorch 2.5, expanding support for Intel GPUs. PyTorch 2.5 now includes broader compatibility with various Intel GPUs, including Intel Arc discrete graphics, Intel Core Ultra processors with built-in Intel Arc graphics, and the Intel Data Center GPU Max Series.

With this expanded support, developers looking to fine-tune, run inference, and experiment with PyTorch models on Intel Core Ultra AI PCs can directly install PyTorch using preview and nightly binary releases for Windows, Linux, and Windows Subsystem for Linux. This allows for easier use of Intel GPU-based systems in PyTorch-based AI development.

Intel's key contributions to PyTorch 2.5 include:

Expanding the PyTorch hardware backend support matrix to encompass both Intel Data Center and Client GPUs.

Implementing SYCL kernels to enhance the coverage and execution of Aten operators on Intel GPUs, boosting performance in PyTorch eager mode.

Enhancing the Intel GPU backend of torch.compile to improve inference and training performance across various deep learning workloads.

Intel also highlighted that PyTorch 2.5 includes improvements and new features for the latest Intel data center CPUs. The FP16 datatype is now supported and optimized through Intel Advanced Matrix Extensions for both eager mode and TorchInductor, improving inference capabilities on the latest Intel data center CPUs, including the new Intel Xeon 6 processors. Additionally, TorchInductor's C++ backend is now available on Windows, enhancing the development experience for AI developers using Windows.

Intel's contributions to PyTorch 2.5 signify their commitment to advancing AI development and providing developers with powerful tools and optimized hardware.

In addition to Intel's contributions, the PyTorch 2.5 release includes a new CuDNN backend for SDPA, improving speed on H100s and newer GPUs. Furthermore, regional compilation within torch.compile reduces cold start times by allowing users to compile a repeated nn.Module (e.g., a transformer layer in an LLM) once, eliminating recompilations. The full release notes are available here.