Qualcomm gears up for AI inference revolution | Computer Weekly

Qualcomm’s answer to Nvidia’s dominance in the artificial acceleration market is a pair of new chips for server racks, the A1200 and A1250, based on its existing neural processing unit (NPU) technology.

Significantly, Qualcomm has developed a novel memory architecture for the A1250 based on near-memory computing, which it claims provides “a generational leap in efficiency and performance for AI inference workloads”. It does so, according to Qualcomm, by delivering greater than 10x higher effective memory bandwidth and much lower power consumption.

The A1200 is being positioned by Qualcomm as being purpose-built for running AI inference using a cluster of server racks. The company claimed it has been designed to deliver low total cost of ownership (TCO). Qualcomm said the A1200 has been optimised for large language model (LLM) and multimodal model (MMM) inference and other AI workloads.

To accompany the A1200 and A1250, Qualcomm is providing a software stack, which it said offers “seamless compatibility with leading AI frameworks” and enables enterprises and developers to deploy secure, scalable generative AI across datacentres.

As analyst Forrester points out, these chips appear to be targeting Nvidia and AMD with GPU and rack-scale products. According to Forrester senior analyst Alvin Nguyen, the Qualcomm offerings make sense given that the market for rack-scale AI inference is highly profitable and the current providers of rack-based inference hardware are unable to fully satisfy demand.

“The core of their AI looks to be based on existing NPU designs, so this lowers their barrier to entry. It also seems that they are creating GPUs with larger memory capacity than Nvidia or AMD (768 GB) which could give it an advantage with certain AI workloads,” he added.

In a LinkedIn article posted in March, Jack Gold, strategic adviser and technology analyst at J Gold Associates, predicted that within two to three years, 85% of enterprise AI workloads will be inference-based, rather than the current predominance of training workloads. Training generally requires the high-performance AI-optimised server infrastructure that resides in hyperscale datacentres provided by the likes of AWS, Azure and the Google Cloud Platform.

Like many industry watchers, Gold believes that most AI workloads run by enterprises in hyperscale infrastructure are pilot projects. Evidence from numerous surveys of business and IT leaders shows that these projects often fail to mature into production. But as Gold points out, once an AI model is trained, which requires high performance graphics processor unit (GPU) AI acceleration hardware, it can then be used on more modest hardware.

“Most enterprise AI workloads running today are still experimental and/or small scale. As AI moves to production level inference-based solutions, the need for high-end GPUs is less important and standard server SoCs [systems on a chip] are more appropriate,” Gold said.

This is the market opportunity Qualcomm is hoping to address with the A1200 and A1250 hardware. Durga Malladi, senior vice-president and general manager of technology planning, edge solutions and datacentre of Qualcomm Technologies, said the two products offer a way for organisations to run AI inference AI models more easily.

“With seamless compatibility for leading AI frameworks and one-click model deployment, Qualcomm AI200 and AI250 are designed for frictionless adoption and rapid innovation,” Malladi added.

Source link