Microsoft has announced today at the HOT CHIPS Symposium its new deep learning acceleration platform, codenamed Project Brainwave. The project was created to design a system for real-time AI, which requires ultra-low latency in order for it to be able to process requests as fast as it receives them. According to Doug Burger, an engineer at Microsoft, "Project Brainwave achieves a major leap forward in both performance and flexibility for cloud-based serving of deep learning models".
The Project Brainwave system was built with three main layers. First, Microsoft leveraged the massive Field-Programmable Gate Array (FPGA) infrastructure it has been deploying through its Project Catapult over the last few years. By using high-performance FPGAs, the Project Brainwave team was able to serve Deep Neural Networks (DNNs) as hardware microservices, which reduced latency by removing the need of processing of incoming requests by the CPU, and allowed very high throughput, because the FPGA could process requests as fast as the network could stream them.
Second, the team has used a powerful DNN Processing Unit (DPU), synthesized onto commercially available FPGAs, but building it in a different way than what other companies are already doing. Instead of defining the chip"s operators and data types at design time, which limits their flexibility, Microsoft designed a chip that scales across a range of data types.
Third, Project Brainwave supports a wide range of popular deep learning frameworks, including Microsoft Cognitive Toolkit and Google’s Tensorflow. In order to do so, the team has defined a graph-based intermediate representation that bridges models trained in the popular frameworks with the company"s high-performance infrastructure.
Microsoft claims that their "system, designed for real-time AI, can handle complex, memory-intensive models such as Long short-term memories (LSTMs), without using batching to juice throughput". As a demonstration, the company used Intel’s new 14 nm Stratix 10 FPGA and Microsoft’s custom 8-bit floating point format (“ms-fp8”) to run a large Gated Recurrent Unit (GRU) model at the HOT CHIPS Symposium today. The result was an impressive 39.5 Teraflops sustained by Stratix 10 while running the large GRU model - each request ran in under one millisecond.
Finally, Microsoft plans to bring Project Brainwave to Azure customers and to use it to power other products, such as Bing, in the future. Unfortunately, no release time was shared at the moment.
Source: Microsoft