NVIDIA has introduced a brand new graphics processor that, it hopes, will present the computational energy required for “massive-context processing” in synthetic intelligence methods — to a claimed million-token scale.
“The Vera Rubin platform will mark one other leap within the frontier of AI computing — introducing each the next-generation Rubin GPU and a brand new class of processors referred to as CPX,” Jensen Huang, NVIDIA founder and chief government officer, says of the corporate’s newest launch. “Simply as RTX revolutionized graphics and bodily AI, Rubin CPX is the primary CUDA GPU purpose-built for massive-context AI, the place fashions cause throughout hundreds of thousands of tokens of information without delay.”
NVIDIA has unveiled a brand new class of GPU that it hopes will push LLMs, VLMs, and different fashions to a million-token context window scale: Rubin CPX. (📷: NVIDIA)
The big language fashions (LLMs) underpinning the present AI growth are statistical token manipulators: educated on huge troves of often-illegitimately-gained knowledge, they boil all the things down into “tokens” — then, when introduced with an enter immediate that itself has been changed into tokens, reply with probably the most statistically-likely tokens by means of continuation. If all has gone properly, these tokens characterize a solution to your question; in any other case, they characterize an answer-shaped object that, the LLM being fully incapable of something resembling thought or reasoning no matter advertising and marketing departments’ claims in any other case, can have little or no resemblance to info or actuality.
The extra tokens you’ll be able to present, the extra possible the answer-shaped token stream supplied can be of use — however the computational complexity will increase, leaving most fashions restricted to comparatively small “context home windows.” That is the place Rubin, named for astronomer and physicist Vera Rubin, is available in, with NVIDIA claiming it supplies a option to scale LLMs and different generative AI fashions — together with picture and video era fashions, which work equally — to context home windows of as much as one million tokens.
The Rubin CPX, NVIDIA claims, delivers as much as 30 peta floating-point operations per second (petaFLOPS) of NVFP4 precision compute, and contains 128GB of GDDR7 reminiscence — swapping the efficiency of high-bandwidth reminiscence for the power to cram extra on the board. In comparison with NVIDIA’s Grace-Blackwell GB300 NVL72 methods, the corporate says it could actually ship a tripling in consideration efficiency — a mannequin’s skill to course of context sequences.
A rack stuffed with 144 Rubin CPX, 144 Rubin, and 36 Vera chips will ship a claimed eight exaFLOPS of NVFP4 compute. (📷: NVIDIA)
The corporate is not anticipating anybody to utilize a single Rubin CPX, although: NVIDIA envisions the boards being mixed with non-CPX Rubin GPUs and Vera CPUs, displaying off a fully-stocked rack implementation dubbed the Vera Rubin NVL144 CPX — a mix of 144 Rubin CPX GPUs, 144 plain Rubin CPUs, and 36 Vera CPUs for a complete of eight exaFLOPS of NVFP4 compute. Whereas that is unlikely to be low-cost, NVIDIA makes a daring declare of profitability: $100 million spent on its Rubin-based {hardware} may ship, the corporate claims, “as a lot as” $5 billion in income.
Extra info on the Rubin CPX is offered on the NVIDIA Developer Technical Weblog; {hardware} is anticipated to develop into obtainable on the finish of subsequent yr — at an as-yet introduced value level.