GPU

Fundamentals

A Graphics Processing Unit - a specialized processor designed for parallel computation, now essential for training and running AI models due to its ability to perform thousands of operations simultaneously.

Like a factory floor with thousands of simple workers who can all do the same task simultaneously - slow individually, overwhelming collectively.

A GPU (Graphics Processing Unit) is a processor originally designed to accelerate graphics rendering by performing many calculations in parallel. Unlike CPUs, which have a few powerful cores optimized for sequential tasks, GPUs contain thousands of smaller cores that can execute operations simultaneously. This massively parallel architecture turns out to be ideal for the matrix multiplications and tensor operations that dominate machine learning workloads.

In AI, GPUs are used for both training and inference. During training, a model must process large batches of data and update millions or billions of parameters through backpropagation - all of which involves dense linear algebra that maps naturally onto GPU hardware. What might take weeks on a CPU can often be completed in hours or days on a modern GPU. NVIDIA dominates the AI GPU market with its CUDA ecosystem and data center GPUs like the A100 and H100, though AMD and Intel are competing with alternatives.

GPU memory (VRAM) is a key constraint in AI workloads. A model's parameters, activations, gradients, and KV cache must all fit in VRAM during operation. This has driven techniques like model parallelism (splitting models across multiple GPUs), quantization (reducing parameter precision to use less memory), and offloading (moving data between GPU and system RAM). The cost and availability of GPUs remains one of the most significant bottlenecks in AI development, influencing everything from research pace to model deployment strategies.

Last updated: February 26, 2026

GPU

Related Terms