The NVIDIA HGX H200 is the latest GPU that will power generative AI servers

NVIDIA has been making tons of money over the past year, thanks to companies like Microsoft, OpenAI, and others buying a lot of the company's high-end GPUs to power their generative AI products. Today, NVIDIA announced a new generation of its AI GPUs that promises to offer Microsoft and others even more speed and performance for AI services.

In a press release, NVIDIA announced the HGX H200 GPU, based on the company's Hopper chip architecture and its H200 Tensor Core GPU. It states:

The NVIDIA H200 is the first GPU to offer HBM3e — faster, larger memory to fuel the acceleration of generative AI and large language models, while advancing scientific computing for HPC workloads. With HBM3e, the NVIDIA H200 delivers 141GB of memory at 4.8 terabytes per second, nearly double the capacity and 2.4x more bandwidth compared with its predecessor, the NVIDIA A100.

NVIDIA claims that the new chip will offer nearly double the inference speed when it runs the Llama 2 large language model, compared to the H100.

The company stated that the biggest cloud services, including Microsoft Azure, Amazon Web Services, Google Cloud, and Oracle Cloud Infrastructure have already signed up to buy the new HGX H200 GPU. It can be used in four-way and eight- way configurations which are compatible with the older HGX H100 hardware and software systems. NVIDIA added:

An eight-way HGX H200 provides over 32 petaflops of FP8 deep learning compute and 1.1TB of aggregate high-bandwidth memory for the highest performance in generative AI and HPC applications.

In addition, a number of server hardware partners, including ASRock Rack, ASUS, Dell Technologies, GIGABYTE, Hewlett Packard Enterprise, Lenovo, and many more will be able to upgrade their older H100 systems with the new H200 chip.

The HGX H200 GPU will be available from those server PC companies and those cloud-based businesses sometime in the second quarter of 2024. It previously announced the GH200 Grace Hopper generative AI platform, which will also ship in the second quarter of 2024.