Vitality-Environment friendly NPU Know-how Cuts AI Energy Use by 44%

July 10, 2025

51

Researchers on the Korea Superior Institute of Science and Know-how (KAIST) have developed energy-efficient NPU expertise that demonstrates substantial efficiency enhancements in laboratory testing.

Their specialised AI chip ran AI fashions 60% quicker whereas utilizing 44% much less electrical energy than the graphics playing cards at the moment powering most AI techniques, based mostly on outcomes from managed experiments.

To place it merely, the analysis, led by Professor Jongse Park from KAIST’s College of Computing in collaboration with HyperAccel Inc., addresses probably the most urgent challenges in trendy AI infrastructure: the large vitality and {hardware} necessities of large-scale generative AI fashions.

Present techniques equivalent to OpenAI’s ChatGPT-4 and Google’s Gemini 2.5 demand not solely excessive reminiscence bandwidth but additionally substantial reminiscence capability, driving corporations like Microsoft and Google to buy lots of of hundreds of NVIDIA GPUs.

The reminiscence bottleneck problem

The core innovation lies within the crew’s strategy to fixing reminiscence bottleneck points that plague present AI infrastructure. Their energy-efficient NPU expertise focuses on “light-weight” the inference course of whereas minimising accuracy loss—a important steadiness that has confirmed difficult for earlier options.

PhD pupil Minsu Kim and Dr Seongmin Hong from HyperAccel Inc., serving as co-first authors, offered their findings on the 2025 Worldwide Symposium on Laptop Structure (ISCA 2025) in Tokyo. The analysis paper, titled “Oaken: Quick and Environment friendly LLM Serving with On-line-Offline Hybrid KV Cache Quantization,” particulars their complete strategy to the issue.

The expertise centres on KV cache quantisation, which the researchers determine as accounting for most reminiscence utilization in generative AI techniques. By optimising this part, the crew allows the identical degree of AI infrastructure efficiency utilizing fewer NPU gadgets in comparison with conventional GPU-based techniques.

Technical innovation and structure

The KAIST crew’s energy-efficient NPU expertise employs a three-pronged quantisation algorithm: threshold-based online-offline hybrid quantisation, group-shift quantisation, and fused dense-and-sparse encoding. This strategy permits the system to combine with present reminiscence interfaces with out requiring adjustments to operational logic in present NPU architectures.

The {hardware} structure incorporates page-level reminiscence administration methods for environment friendly utilisation of restricted reminiscence bandwidth and capability. Moreover, the crew launched new encoding methods particularly optimised for quantised KV cache, addressing the distinctive necessities of their strategy.

“This analysis, by joint work with HyperAccel Inc., discovered an answer in generative AI inference light-weighting algorithms and succeeded in growing a core NPU expertise that may clear up the reminiscence drawback,” Professor Park defined.

“By way of this expertise, we carried out an NPU with over 60% improved efficiency in comparison with the most recent GPUs by combining quantisation methods that scale back reminiscence necessities whereas sustaining inference accuracy.”

Sustainability implications

The environmental affect of AI infrastructure has grow to be a rising concern as generative AI adoption accelerates. The energy-efficient NPU expertise developed by KAIST gives a possible path towards extra sustainable AI operations.

With 44% decrease energy consumption in comparison with present GPU options, widespread adoption may considerably scale back the carbon footprint of AI cloud companies. Nevertheless, the expertise’s real-world affect will rely on a number of components, together with manufacturing scalability, cost-effectiveness, and trade adoption charges.

The researchers acknowledge that their resolution represents a major step ahead, however widespread implementation would require continued growth and trade collaboration.

Business context and future outlook

The timing of this energy-efficient NPU expertise breakthrough is especially related as AI corporations face growing strain to steadiness efficiency with sustainability. The present GPU-dominated market has created provide chain constraints and elevated prices, making different options more and more engaging.

Professor Park famous that the expertise “has demonstrated the potential of implementing high-performance, low-power infrastructure specialised for generative AI, and is anticipated to play a key position not solely in AI cloud information centres but additionally within the AI transformation (AX) setting represented by dynamic, executable AI equivalent to agentic AI.”

The analysis represents a major step towards extra sustainable AI infrastructure, however its final affect will probably be decided by how successfully it may be scaled and deployed in business environments. Because the AI trade continues to grapple with vitality consumption issues, improvements like KAIST’s energy-efficient NPU expertise provide hope for a extra sustainable future in synthetic intelligence computing.

(Picture by Korea Superior Institute of Science and Know-how)

See additionally: The 6 practices that guarantee extra sustainable information centre operations

Wish to be taught extra about cybersecurity and the cloud from trade leaders? Take a look at Cyber Safety & Cloud Expo going down in Amsterdam, California, and London.

Discover different upcoming enterprise expertise occasions and webinars powered by TechForge right here.

Previous articleJuly Patch Tuesday provides 127 fixes – Sophos Information

Next articleGrok 4 is Right here and it is Merely Good!

Vitality-Environment friendly NPU Know-how Cuts AI Energy Use by 44%

The reminiscence bottleneck problem

Technical innovation and structure

Sustainability implications

Business context and future outlook

Agentic cloud ops with the brand new Azure Copilot

Snowflake to accumulate Choose Star to reinforce its Horizon Catalog

Google updates Gemini API for Gemini 3

LEAVE A REPLY Cancel reply

Most Popular

What works and what doesn’t (Analyst Angle)

Studying sturdy controllers that work throughout many partially observable environments

How KV Caching Makes Fashionable LLMs Quick?

Podcast: Is the related automobile revolution lastly right here, or are we nonetheless caught in impartial?

Recent Comments

ABOUT US

POPULAR POSTS

What works and what doesn’t (Analyst Angle)

Studying sturdy controllers that work throughout many partially observable environments

How KV Caching Makes Fashionable LLMs Quick?

POPULAR CATEGORY