NVIDIA has announced TensorRT-LLM for Windows. This open-source library will allow PC developers with NVIDIA GeForce RTX graphics cards to boost the performance of LLMs by up to four times.
Tensorrt RSS
Providing over twice the precision and inference speed compared to the last generation, Nvidia's new TensorRT 8 deep learning SDK clocked in a time of 1.2 ms in BERT-Large's inference.