Nous Analysis has launched Hermes 4, a household of open-weight fashions (14B, 70B, and 405B parameter sizes based mostly on Llama 3.1 checkpoints) that achieves frontier-level efficiency by way of pure post-training methods. Hermes 4 introduces hybrid reasoning – fashions can toggle between normal responses and express reasoning utilizing
tags when advanced issues require deeper deliberation.
What makes Hermes 4 notably important is its achievement of state-of-the-art efficiency amongst open-weight fashions whereas sustaining full transparency and impartial alignment philosophy, demonstrating that refined reasoning capabilities could be developed solely by way of open-source methodologies.
DataForge: Graph-Based mostly Artificial Information Era
DataForge is the principle element behind Hermes 4’s core construction. However what’s DataForge? DataForge is a revolutionary graph-based artificial information era system that transforms how coaching information is created. Not like conventional curation approaches, DataForge operates by way of a directed acyclic graph (DAG) the place every node implements a PDDL (Planning Area Definition Language) motion interface.
Every node specifies preconditions, postconditions, and transformations, facilitating the automated creation of advanced information pipelines. By utilizing pre-training seed information from DCLM and FineWeb, the system can rework a Wikipedia article right into a rap tune, after which generate instruction-answer pairs based mostly on that transformation.
This strategy generates roughly 5 million samples totaling 19 billion tokens, with reasoning samples being deliberately token-heavy – averaging 5 occasions extra tokens than non-reasoning counterparts to accommodate pondering traces as much as 16,000 tokens lengthy.


Rejection Sampling at Unprecedented Scale
Hermes 4 makes use of Atropos, Nous Analysis’s open-source reinforcement studying surroundings, to implement rejection sampling throughout roughly 1,000 completely different task-specific verifiers. This huge verification infrastructure filters for high-quality reasoning trajectories throughout numerous domains.
Key verification environments embody Reply Format Coaching (rewarding appropriate formatting throughout 150+ output codecs), Instruction Following (utilizing RLVR-IFEval duties with advanced constraints), Schema Adherence (for JSON era utilizing Pydantic fashions), and Instrument Use coaching for agentic conduct.
The rejection sampling course of creates a big corpus of verified reasoning trajectories, with a number of distinctive resolution paths to the identical verified end result. This strategy ensures the mannequin learns sturdy reasoning patterns relatively than memorizing particular resolution templates.
Size Management: Fixing Overlong Era
One in all Hermes 4’s most revolutionary contributions addresses the overlong reasoning drawback – the place reasoning fashions generate excessively lengthy chains of thought with out termination. The analysis workforce found their 14B mannequin reached most context size 60% of the time on LiveCodeBench when in reasoning mode.
Their tremendous efficient resolution entails a second supervised fine-tuning stage educating fashions to cease reasoning at precisely 30,000 tokens:
- Generate reasoning traces from the present coverage
- Insert
tokens at precisely 30,000 tokens
- Prepare solely on the termination determination, not the reasoning chain
- Apply gradient updates solely to
and
tokens
This strategy achieves outstanding outcomes: 78.4% discount in overlong era on AIME’24, 65.3% on AIME’25, and 79.8% on LiveCodeBench, with solely 4.7% to 12.7% relative accuracy price. By focusing studying alerts solely on the termination determination, the strategy avoids mannequin collapse dangers whereas educating efficient “counting conduct.”




Benchmark Efficiency and Impartial Alignment
Hermes 4 demonstrates state-of-the-art efficiency amongst open-weight fashions. The 405B mannequin achieves 96.3% on MATH-500 (reasoning mode), 81.9% on AIME’24, 78.1% on AIME’25, 70.5% on GPQA Diamond, and 61.3% on LiveCodeBench.
Significantly notable is its efficiency on RefusalBench, attaining 57.1% in reasoning mode – the very best rating amongst evaluated fashions, considerably outperforming GPT-4o (17.67%) and Claude Sonnet 4 (17%). This demonstrates the mannequin’s willingness to interact with controversial subjects whereas sustaining acceptable boundaries, reflecting Nous Analysis’s impartial alignment philosophy.


Technical Structure and Coaching
Hermes 4 coaching leverages a modified TorchTitan throughout 192 NVIDIA B200 GPUs. The system handles extremely heterogeneous pattern size distribution by way of environment friendly packing (attaining >99.9% batch effectivity), flex consideration, and complicated loss masking the place solely assistant-role tokens contribute to cross-entropy loss.
Coaching follows a cosine studying fee schedule with 300 warmup steps and 9,000 complete steps at 16,384 token context size with international batch dimension of 384 samples, combining Information Parallelism, Tensor Parallelism, and Totally Sharded Information Parallelism.
Abstract
Hermes 4 marks a big development in open-source AI improvement, proving that frontier-level reasoning capabilities could be achieved by way of clear, reproducible methodologies with out counting on proprietary coaching information or closed improvement processes. By combining revolutionary graph-based artificial information era, massive-scale rejection sampling, and stylish size management mechanisms, Nous Analysis has created fashions that not solely match the efficiency of main proprietary methods but additionally keep the impartial alignment and steerability that make them genuinely helpful instruments relatively than restrictive assistants
Take a look at the Paper, Technical particulars, Mannequin on Hugging Face and Chat. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to observe us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Publication.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.