HomeArtificial IntelligenceByteDance Introduces Seed-Prover: An Superior Formal Reasoning System for Automated Mathematical Theorem...

ByteDance Introduces Seed-Prover: An Superior Formal Reasoning System for Automated Mathematical Theorem Proving


LLMs have proven notable enhancements in mathematical reasoning by extending by means of pure language, leading to efficiency good points on benchmarks corresponding to MATH and AIME. Nonetheless, reinforcement studying (RL) for coaching these fashions encounters a problem: verifying the correctness of pure language proofs may be very troublesome, requiring cautious handbook checking of every reasoning step. This limits the applying of RL for coaching mathematical theorem-proving fashions. Whereas formal languages like Lean provide automated correctness verification, present LLM formal provers face their limitations. Step-level provers generate code incrementally however require particular scaffolding and lack high-level reasoning capabilities.

ByteDance Seed Workforce introduces Seed-Prover, a lemma-style whole-proof reasoning mannequin. It refines proofs iteratively utilizing Lean suggestions, beforehand established lemmas, and self-summarization. Seed-Prover employs three specialised test-time inference methods that enable deep and broad reasoning strategies to unravel IMO-level contest issues. Its main innovation is in adopting lemma-style proving as its core technique, inserting lemmas on the heart of the reasoning course of moderately than counting on conventional step-by-step or whole-proof technology strategies. Furthermore, this paper introduces Seed-Geometry,  a complementary geometry reasoning engine that overcomes Lean’s limitations in dealing with geometric assist.

For interplay between Seed-Prover and Lean, multi-stage, multi-task RL based mostly on VAPO is utilized. The coaching dataset combines open-source datasets with in-house formal issues, utilizing a proposer to create less complicated variants of inauspicious duties. It excludes overly easy issues with proof charges above 25%. Seed-Geometry’s backend helps large-scale drawback technology, figuring out over 230 million distinctive issues throughout seven days with an eightfold enchancment in search effectivity. A separate coverage and worth mannequin is skilled, although intensive testing reveals that worth fashions might scale back efficiency as a consequence of estimation errors. Consequently, step-by-step technology with beam search is adopted in distributed setups.

Seed-Prover achieves state-of-the-art outcomes throughout a number of mathematical benchmarks. For IMO 2025, Seed-Prover absolutely solves 5 out of 6 issues, with Seed-Geometry immediately fixing Drawback 2 and Seed-Prover deriving proofs for the remaining drawback utilizing numerous inference settings. On previous IMO issues, it proved 121 out of 155 duties, reaching a 78.1% success charge throughout all issue ranges. The efficiency breakdown reveals sturdy outcomes throughout drawback classes: fixing 47 out of 55 simple issues, 47 out of 56 medium issues, and 27 out of 44 exhausting issues, with subject-specific success charges together with 72 out of 85 in algebra, 42 out of 55 in quantity concept, and seven out of 14 in combinatorics.

On MiniF2F, researchers obtain a 99.6% proof charge for each validation and take a look at units beneath medium settings, fixing troublesome issues corresponding to IMO 1990 P3. PutnamBench outcomes present enchancment from 201 to 331 solved issues out of 657 when upgrading from gentle to medium inference settings, exhibiting a major efficiency soar over earlier undergraduate-level mathematical reasoning programs. On CombiBench, Seed-Prover solves 30 out of 100 combinatorics issues, outperforming current strategies however revealing continued challenges in combinatorial reasoning. Researchers obtain 81.8% success on MiniCTX-v2, exhibiting sturdy generalization past competitors issues and outperforming the o4-mini baseline’s 44.3% at Move@8.

In conclusion, ByteDance Seed presents Seed-Geometry and Seed-Prover, two formal reasoning strategies that combine the capabilities of LLMs. Seed-Geometry offers accelerated verification and enhanced search mechanisms whereas Seed-Prover makes use of iterative refinement and sophisticated test-time inference methods. The achievement of fixing 5 out of 6 issues within the IMO 2025 reveals the sensible efficacy of those strategies in tackling elite mathematical competitions. The adoption of formal languages like Lean offers fast proof verification that’s less expensive than human specialists and extra dependable than LLM-based judges. Future analysis will give attention to combining formal programs with LLMs to deal with open conjectures.


Try the Paper and GitHub Web page. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Publication.


Sajjad Ansari is a last yr undergraduate from IIT Kharagpur. As a Tech fanatic, he delves into the sensible functions of AI with a give attention to understanding the affect of AI applied sciences and their real-world implications. He goals to articulate advanced AI ideas in a transparent and accessible method.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments