【原】LLMs之Prover：《DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning

處女座的程序猿 2025-06-02 發(fā)布于上海

展開全文

LLMs之Prover：《DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition》翻譯與解讀

導(dǎo)讀：DeepSeek-Prover-V2通過結(jié)合大型語言模型的推理能力和形式化驗(yàn)證系統(tǒng)的嚴(yán)格性，提出了一種新的形式化定理證明方法。該方法通過子目標(biāo)分解、遞歸求解、課程學(xué)習(xí)和強(qiáng)化學(xué)習(xí)等技術(shù)，有效地提升了模型在各種數(shù)學(xué)基準(zhǔn)測(cè)試中的性能，并縮小了非形式化推理和形式化證明之間的差距，為未來的自動(dòng)定理證明研究奠定了基礎(chǔ)。

>> 背景痛點(diǎn)

● 推理有效性：大型語言模型（LLMs）在數(shù)學(xué)問題求解方面取得了顯著進(jìn)展，這主要得益于推理時(shí)規(guī)模擴(kuò)展，特別是自然語言鏈?zhǔn)剿伎?#xff08;CoT）推理。

● 形式化定理的挑戰(zhàn)：盡管自然語言推理在解決競賽級(jí)數(shù)學(xué)問題上很成功，但將其應(yīng)用于形式化定理證明仍然具有根本性的挑戰(zhàn)。

● 非形式化的劣勢(shì)和形式化的特點(diǎn)：LLMs的自然語言推理本質(zhì)上是非形式化的，依賴于啟發(fā)式方法、近似和數(shù)據(jù)驅(qū)動(dòng)的猜測(cè)模式，這些模式通常缺乏形式驗(yàn)證系統(tǒng)所需的嚴(yán)格結(jié)構(gòu)。形式化驗(yàn)證系統(tǒng)（如Lean, Isabelle, Coq）基于嚴(yán)格的邏輯基礎(chǔ)，每個(gè)證明步驟都必須顯式構(gòu)造和形式驗(yàn)證，不允許任何歧義、隱含假設(shè)或細(xì)節(jié)遺漏。

● 兩者的挑戰(zhàn)：如何彌合非形式化的高級(jí)推理與形式化驗(yàn)證系統(tǒng)的句法嚴(yán)格性之間的差距，是神經(jīng)定理證明中長期存在的挑戰(zhàn)。

>> 解決方案

● 提出了DeepSeek-Prover-V2，一個(gè)用于Lean 4形式化定理證明的開源大型語言模型。

● 利用DeepSeek-V3驅(qū)動(dòng)的遞歸定理證明流程收集初始化數(shù)據(jù)。通過提示DeepSeek-V3將復(fù)雜問題分解為一系列子目標(biāo)來啟動(dòng)冷啟動(dòng)訓(xùn)練過程。將已解決子目標(biāo)的證明合成為鏈?zhǔn)剿伎歼^程，并結(jié)合DeepSeek-V3的逐步推理，為強(qiáng)化學(xué)習(xí)創(chuàng)建一個(gè)初始冷啟動(dòng)。

● 將非形式化和形式化數(shù)學(xué)推理集成到一個(gè)統(tǒng)一的模型中。

● 構(gòu)建了一個(gè)簡單的遞歸定理證明pipeline，利用DeepSeek-V3作為子目標(biāo)分解和形式化的統(tǒng)一工具。

● 提示DeepSeek-V3將定理分解為高級(jí)證明草圖，同時(shí)將這些證明步驟形式化為Lean 4中的子目標(biāo)序列。使用較小的7B模型處理每個(gè)子目標(biāo)的證明搜索，從而減少計(jì)算負(fù)擔(dān)。

● 引入課程學(xué)習(xí)框架，利用分解的子目標(biāo)生成推測(cè)性定理，逐步增加訓(xùn)練任務(wù)的難度，以更好地指導(dǎo)模型的學(xué)習(xí)過程。將完整的逐步形式證明與DeepSeek-V3的相應(yīng)鏈式思考配對(duì)，以創(chuàng)建冷啟動(dòng)推理數(shù)據(jù)。

● 應(yīng)用強(qiáng)化學(xué)習(xí)階段，以進(jìn)一步加強(qiáng)非形式化數(shù)學(xué)推理和形式證明構(gòu)造之間的聯(lián)系。

>> 核心思路步驟

● 子目標(biāo)分解：使用DeepSeek-V3將復(fù)雜定理分解為更小的、可管理的子目標(biāo)（lemma）。

● 形式化草圖：將分解的子目標(biāo)轉(zhuǎn)換為Lean 4的形式化語句，但省略證明細(xì)節(jié)，用"sorry"占位符表示。

● 遞歸求解：使用較小的7B prover模型遞歸地解決每個(gè)子目標(biāo)，利用先前的子目標(biāo)作為前提。

● 合成完整證明：將子目標(biāo)的證明組合成原始問題的完整形式化證明。

● 冷啟動(dòng)數(shù)據(jù)生成：將完整的形式化證明與DeepSeek-V3的鏈?zhǔn)剿伎歼^程相結(jié)合，創(chuàng)建高質(zhì)量的冷啟動(dòng)訓(xùn)練數(shù)據(jù)。

● 課程學(xué)習(xí)：利用子目標(biāo)生成難度遞增的訓(xùn)練任務(wù)，逐步引導(dǎo)prover模型解決更具挑戰(zhàn)性的問題。

● 強(qiáng)化學(xué)習(xí)：使用二元正確/錯(cuò)誤反饋?zhàn)鳛楠?jiǎng)勵(lì)信號(hào)，并加入一致性獎(jiǎng)勵(lì)，確保生成的證明結(jié)構(gòu)與鏈?zhǔn)剿伎嫉?span style="color:#ff0000">分解一致。

>> 優(yōu)勢(shì)

● DeepSeek-Prover-V2-671B在神經(jīng)定理證明方面達(dá)到了最先進(jìn)的性能。在MiniF2F-test上達(dá)到了88.9%的pass ratio。解決了PutnamBench中的658個(gè)問題中的49個(gè)。在ProverBench的15個(gè)AIME問題中成功解決了6個(gè)。

● 縮小了大型語言模型中形式和非形式數(shù)學(xué)推理之間的差距。

● 通過將一般用途LLM與輕量級(jí)專用7B prover集成，實(shí)現(xiàn)了90.2％的miniF2F-valid成功率。

● CoT推理模式在形式數(shù)學(xué)推理中比非CoT模式具有顯著的性能優(yōu)勢(shì)。

● 7B模型在PutnamBench數(shù)據(jù)集上使用非CoT生成模式表現(xiàn)出色，成功解決了671B版本未解決的13個(gè)問題。

>> 結(jié)論和觀點(diǎn)

● 通過合成冷啟動(dòng)推理數(shù)據(jù)，可以有效提升形式化定理證明的能力。

● 遞歸定理證明框架，結(jié)合子目標(biāo)分解和形式化，是一種有前景的方法。

● 課程學(xué)習(xí)和強(qiáng)化學(xué)習(xí)可以進(jìn)一步增強(qiáng)模型在形式化定理證明方面的能力。

● 高容量模型即使在沒有明確CoT提示的情況下，也可能內(nèi)化和外化中間推理。

● 建議未來的工作重點(diǎn)是將該范例擴(kuò)展到類似AlphaProof的系統(tǒng)，目標(biāo)是解決代表自動(dòng)定理證明挑戰(zhàn)前沿的IMO級(jí)數(shù)學(xué)問題。

● 建議進(jìn)一步探索如何利用大型語言模型的非形式化推理能力來指導(dǎo)形式化證明的構(gòu)建。

● 建議研究如何設(shè)計(jì)更有效的獎(jiǎng)勵(lì)函數(shù)，以鼓勵(lì)模型生成結(jié)構(gòu)良好、易于理解的形式化證明。

LLMs之Prover：《DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition》翻譯與解讀

地址	論文地址：[2504.21801] DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
時(shí)間	2025年4月30日
作者	DeepSeek-AI

Abstract

We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. In addition to standard benchmarks, we introduce ProverBench, a collection of 325 formalized problems, to enrich our evaluation, including 15 selected problems from the recent AIME competitions (years 24-25). Further evaluation on these 15 AIME problems shows that the model successfully solves 6 of them. In comparison, DeepSeek-V3 solves 8 of these problems using majority voting, highlighting that the gap between formal and informal mathematical reasoning in large language models is substantially narrowing.

我們推出了 DeepSeek-Prover-V2，這是一款開源的大規(guī)模語言模型，專為?Lean 4 中的形式化定理證明而設(shè)計(jì)，其初始化數(shù)據(jù)通過由 DeepSeek-V3 驅(qū)動(dòng)的遞歸定理證明管道收集。冷啟動(dòng)訓(xùn)練過程首先提示 DeepSeek-V3 將復(fù)雜問題分解為一系列子目標(biāo)。已解決子目標(biāo)的證明被合成為一條思維鏈過程，并與 DeepSeek-V3 的逐步推理相結(jié)合，以創(chuàng)建強(qiáng)化學(xué)習(xí)的初始冷啟動(dòng)。這一過程使我們能夠?qū)⒎钦胶驼降臄?shù)學(xué)推理整合到一個(gè)統(tǒng)一的模型中。最終得到的模型 DeepSeek-Prover-V2-671B 在神經(jīng)定理證明方面達(dá)到了最先進(jìn)的性能，在 MiniF2F 測(cè)試中達(dá)到了 88.9% 的通過率，并解決了 PutnamBench 中 658 個(gè)問題中的 49 個(gè)。除了標(biāo)準(zhǔn)基準(zhǔn)測(cè)試外，我們還引入了 ProverBench，這是一個(gè)包含 325 個(gè)形式化問題的集合，以豐富我們的評(píng)估，其中包括從最近的 AIME 競賽（第 24 - 25 年）中精選的 15 個(gè)問題。對(duì)這 15 個(gè) AIME 問題的進(jìn)一步評(píng)估表明，該模型成功解決了其中的 6 個(gè)。相比之下，DeepSeek-V3 通過多數(shù)投票解決了其中的 8 個(gè)問題，這表明大型語言模型中形式化數(shù)學(xué)推理與非形式化數(shù)學(xué)推理之間的差距正在大幅縮小。

Figure 1:Benchmark performance of DeepSeek-Prover-V2. On the AIME benchmark, DeepSeek-V3 is evaluated using the standard find-answer task for natural-language reasoning, while prover models generate Lean code to construct formal proofs for a given correct answer.圖 1：DeepSeek-Prover-V2 的基準(zhǔn)性能。在 AIME 基準(zhǔn)測(cè)試中，DeepSeek-V3 通過標(biāo)準(zhǔn)的自然語言推理找答案任務(wù)進(jìn)行評(píng)估，而證明模型則生成 Lean 代碼來為給定的正確答案構(gòu)建形式化證明。

1、Introduction

The emergence of reasoning capabilities in large language models (LLMs) has revolutionized numerous areas of artificial intelligence, particularly in the domain of mathematical problem solving?(DeepSeek-AI,?2025). These advancements are largely enabled by the paradigm of inference-time scaling, most notably through natural language chain-of-thought reasoning?(Jaech et?al.,?2024). Rather than relying solely on a single forward pass to arrive at an answer, LLMs can reflect on intermediate reasoning steps, improving both accuracy and interpretability. Despite the success of natural language reasoning in solving competition-level mathematical problems, its application to formal theorem proving remains fundamentally challenging. LLMs perform natural language reasoning in an inherently informal manner, relying on heuristics, approximations, and data-driven guessing patterns that often lack the rigorous structure required by formal verification systems. In contrast, proof assistants such as Lean?(Moura and Ullrich,?2021), Isabelle?(Paulson,?1994), and Coq?(Barras et?al.,?1999)?operate on strict logical foundations, where every proof step must be explicitly constructed and formally verified. These systems permit no ambiguity, implicit assumptions, or omission of details. Bridging the gap between informal, high-level reasoning and the syntactic rigor of formal verification systems remains a longstanding research challenge in neural theorem proving?(Yang et?al.,?2024).

To harness the strengths of informal mathematical reasoning in support of formal theorem proving, a classical approach is to hierarchically decompose formal proofs based on the guidance of natural-language proof sketches.?Jiang et?al. (2023)?proposed a framework, called?Draft, Sketch, and Prove?(DSP), that leverages a large language model to generate proof sketches in natural language, which are subsequently translated into formal proof steps. This informal-to-formal theorem proving paradigm closely mirrors the concept of subgoals in hierarchical reinforcement learning?(Barto and Mahadevan,?2003; Nachum et?al.,?2018; Eppe et?al.,?2022), where complex tasks are broken down into a hierarchy of simpler subtasks that can be solved independently to progressively achieve the overarching objective. In formal theorem proving, a subgoal is typically an intermediate proposition or lemma that contributes to the proof of a larger theorem?(Zhao et?al.,?2023,?2024). This hierarchical decomposition aligns with human problem-solving strategies and supports modularity, reusability, and more efficient proof search?(Wang et?al.,?2024b; Zheng et?al.,?2024). Recent studies have extended this paradigm by employing multi-tiered hierarchies for structured proof generation?(Wang et?al.,?2024a), and by leveraging reinforcement learning techniques to optimize the decomposition of complex theorems into manageable subgoals?(Dong et?al.,?2024).

大型語言模型（LLMs）推理能力的出現(xiàn)徹底改變了人工智能的眾多領(lǐng)域，尤其是在數(shù)學(xué)問題求解方面（DeepSeek-AI，2025）。這些進(jìn)步很大程度上得益于推理時(shí)間縮放的范式，尤其是通過自然語言的推理鏈（Jaech 等人，2024）。LLMs 不再僅僅依靠單次前向傳遞得出答案，而是能夠反思中間的推理步驟，從而提高準(zhǔn)確性和可解釋性。盡管自然語言推理在解決競賽級(jí)別的數(shù)學(xué)問題方面取得了成功，但將其應(yīng)用于形式化定理證明仍然存在根本性的挑戰(zhàn)。LLMs 進(jìn)行自然語言推理的方式本質(zhì)上是非正式的，依賴于啟發(fā)式方法、近似值和數(shù)據(jù)驅(qū)動(dòng)的猜測(cè)模式，這些往往缺乏形式驗(yàn)證系統(tǒng)所要求的嚴(yán)格結(jié)構(gòu)。相比之下，諸如 Lean（Moura 和 Ullrich，2021 年）、Isabelle（Paulson，1994 年）和 Coq（Barras 等人，1999 年）這樣的證明助手則基于嚴(yán)格的邏輯基礎(chǔ)運(yùn)行，其中每一步證明都必須明確構(gòu)建并經(jīng)過形式驗(yàn)證。這些系統(tǒng)不允許存在任何模糊性、隱含假設(shè)或細(xì)節(jié)遺漏。在神經(jīng)定理證明領(lǐng)域（Yang 等人，2024 年），如何彌合非正式的高層次推理與形式驗(yàn)證系統(tǒng)語法嚴(yán)謹(jǐn)性之間的差距，一直是一個(gè)長期存在的研究挑戰(zhàn)。

為了利用非正式數(shù)學(xué)推理的優(yōu)勢(shì)來支持形式定理證明，一種經(jīng)典的方法是根據(jù)自然語言證明草圖的指導(dǎo)，對(duì)形式證明進(jìn)行分層分解。Jiang 等人（2023 年）提出了一種名為“草稿、草圖和證明”（DSP）的框架，該框架利用大型語言模型生成自然語言形式的證明草圖，隨后將其轉(zhuǎn)換為形式證明步驟。這種非正式到正式的定理證明范式與分層強(qiáng)化學(xué)習(xí)中的子目標(biāo)概念緊密相關(guān)（Barto 和 Mahadevan，2003 年；Nachum 等人，2018 年；在 Eppe 等人（2022 年）的研究中，復(fù)雜任務(wù)被分解為一系列更簡單的子任務(wù)，這些子任務(wù)可以獨(dú)立解決，從而逐步實(shí)現(xiàn)總體目標(biāo)。在形式化定理證明中，子目標(biāo)通常是有助于證明更大定理的中間命題或引理（Zhao 等人，2023 年，2024 年）。這種分層分解與人類解決問題的策略相一致，并支持模塊化、可重用性和更高效的證明搜索（Wang 等人，2024 年 b；Zheng 等人，2024 年）。最近的研究通過采用多層級(jí)結(jié)構(gòu)進(jìn)行結(jié)構(gòu)化證明生成（Wang 等人，2024 年 a），以及利用強(qiáng)化學(xué)習(xí)技術(shù)優(yōu)化復(fù)雜定理向可管理子目標(biāo)的分解（Dong 等人，2024 年），對(duì)這一范式進(jìn)行了擴(kuò)展。

In this paper, we develop a reasoning model for subgoal decomposition, leveraging a suite of synthetic cold-start data and large-scale reinforcement learning to enhance its performance. To construct the cold-start dataset, we develop a simple yet effective pipeline for recursive theorem proving, utilizing DeepSeek-V3?(DeepSeek-AI,?2024)?as a unified tool for both subgoal decomposition and formalization. We prompt DeepSeek-V3 to decompose theorems into high-level proof sketches while simultaneously formalizing these proof steps in Lean 4, resulting in a sequence of subgoals. Since the subgoal decomposition is powered by a large general-purpose model, we use a smaller 7B model to handle the proof search for each subgoal, thereby reducing the associated computational burden. Additionally, we introduce a curriculum learning framework that leverages the decomposed subgoals to generate conjectural theorems, progressively increasing the difficulty of training tasks to better guide the model’s learning process. Once the decomposed steps of a challenging problem are resolved, we pair the complete step-by-step formal proof with the corresponding chain-of-thought from DeepSeek-V3 to create cold-start reasoning data. Based on the cold start, a subsequent reinforcement learning stage is applied to further strengthen the connection between informal mathematical reasoning and formal proof construction. Our experiments show that reinforcement learning starting from the cold start of informal reasoning in task decomposition significantly enhances the model’s capabilities in formal theorem proving. The resulting DeepSeek-Prover-V2-671B model establishes a new state-of-the-art in neural theorem proving across multiple benchmarks. On MiniF2F-test, it achieves?82.4%?accuracy with Pass@32, improving to?88.9%?with Pass@8192. The model shows strong generalization capabilities to college-level theorem proving, solving?37.1%?of ProofNet-test problems with Pass@1024 and tackling 49 out of 658 challenging PutnamBench problems. Additionally, we contribute ProverBench, a benchmark dataset containing 325 formalized problems to advance neural theorem proving research, including 15 from the prestigious AIME competitions (years 24-25). DeepSeek-Prover-V2-671B successfully solves 6 of these 15 challenging AIME problems, further demonstrating its sophisticated mathematical reasoning capabilities.

在本文中，我們開發(fā)了一種用于子目標(biāo)分解的推理模型，利用一套合成的冷啟動(dòng)數(shù)據(jù)和大規(guī)模強(qiáng)化學(xué)習(xí)來提升其性能。為了構(gòu)建冷啟動(dòng)數(shù)據(jù)集，我們開發(fā)了一個(gè)簡單而有效的遞歸定理證明流水線，利用 DeepSeek-V3（DeepSeek-AI，2024）作為子目標(biāo)分解和形式化的統(tǒng)一工具。我們提示 DeepSeek-V3 將定理分解為高級(jí)證明草圖，同時(shí)在 Lean 4 中對(duì)這些證明步驟進(jìn)行形式化，從而形成一系列子目標(biāo)。由于子目標(biāo)分解是由一個(gè)大型通用模型驅(qū)動(dòng)的，我們使用一個(gè)較小的 7B 模型來處理每個(gè)子目標(biāo)的證明搜索，從而減輕相關(guān)的計(jì)算負(fù)擔(dān)。此外，我們引入了一個(gè)課程學(xué)習(xí)框架，利用分解的子目標(biāo)生成推測(cè)性定理，逐步增加訓(xùn)練任務(wù)的難度，以更好地引導(dǎo)模型的學(xué)習(xí)過程。一旦具有挑戰(zhàn)性問題的分解步驟得到解決，我們就將完整的分步形式證明與 DeepSeek-V3 對(duì)應(yīng)的思維鏈配對(duì)，以創(chuàng)建冷啟動(dòng)推理數(shù)據(jù)?；诶鋯?dòng)，隨后應(yīng)用強(qiáng)化學(xué)習(xí)階段，進(jìn)一步加強(qiáng)非形式化數(shù)學(xué)推理與形式化證明構(gòu)建之間的聯(lián)系。我們的實(shí)驗(yàn)表明，從任務(wù)分解中的非形式化推理冷啟動(dòng)開始的強(qiáng)化學(xué)習(xí)顯著增強(qiáng)了模型在形式化定理證明方面的能力。由此產(chǎn)生的 DeepSeek-Prover-V2-671B 模型在多個(gè)基準(zhǔn)測(cè)試中確立了神經(jīng)定理證明的新標(biāo)桿。在 MiniF2F-test 上，它實(shí)現(xiàn)了 82.4% 的準(zhǔn)確率，Pass@32 為 88.9%，Pass@8192 為 88.9%。該模型在大學(xué)水平的定理證明方面表現(xiàn)出強(qiáng)大的泛化能力，在 ProofNet-test 上解決了 37.1% 的問題，Pass@1024 為 49 個(gè)，解決了 658 個(gè)具有挑戰(zhàn)性的 PutnamBench 問題中的 49 個(gè)。此外，我們貢獻(xiàn)了 ProverBench，這是一個(gè)包含 325 個(gè)形式化問題的基準(zhǔn)數(shù)據(jù)集，旨在推動(dòng)神經(jīng)定理證明研究的發(fā)展，其中包括 15 道來自著名的 AIME 競賽（第 24 至 25 屆）的題目。DeepSeek-Prover-V2-671B 成功解決了這 15 道極具挑戰(zhàn)性的 AIME 題目中的 6 道，進(jìn)一步證明了其復(fù)雜的數(shù)學(xué)推理能力。

Conclusion

In this work, we propose a comprehensive pipeline for synthesizing cold-start reasoning data to advance formal theorem proving. Our data construction process is grounded in a recursive theorem-proving framework, wherein DeepSeek-V3 serves as a unified model for both subgoal decomposition and lemma formalization within the Lean 4 proof assistant. Our approach combines high-level proof sketches with formal steps, creating a sequence of manageable subgoals that can be efficiently solved using a smaller 7B model, significantly reducing computational requirements. The curriculum learning framework we developed uses these decomposed subgoals to generate increasingly difficult training tasks, creating a more effective learning progression. By pairing complete formal proofs with DeepSeek-V3’s chain-of-thought reasoning, we established valuable cold-start reasoning data that bridges informal mathematical thinking with formal proof structures. The subsequent reinforcement learning stage substantially enhanced this connection, leading to significant improvements in formal theorem proving capabilities. The resulting model, DeepSeek-Prover-V2-671B, consistently outperforms all baselines across a range of benchmarks, spanning both high-school competition problems and undergraduate-level mathematics. Our future work will focus on scaling this paradigm to an AlphaProof-like system with the ultimate aim of tackling IMO-level mathematical problems that represent the frontier of automated theorem proving challenges.

在本研究中，我們提出了一套全面的流程，用于合成冷啟動(dòng)推理數(shù)據(jù)以推進(jìn)形式化定理證明。我們的數(shù)據(jù)構(gòu)建過程基于一個(gè)遞歸定理證明框架，在此框架中，DeepSeek-V3 在 Lean 4 證明助手內(nèi)統(tǒng)一充當(dāng)子目標(biāo)分解和引理形式化的模型。我們的方法將高層次的證明草圖與形式化步驟相結(jié)合，生成一系列易于處理的子目標(biāo)，這些子目標(biāo)可以使用較小的 7B 模型高效解決，從而大幅降低計(jì)算需求。我們開發(fā)的課程學(xué)習(xí)框架利用這些分解的子目標(biāo)生成難度逐漸增加的訓(xùn)練任務(wù)，形成更有效的學(xué)習(xí)進(jìn)程。通過將完整的形式化證明與 DeepSeek-V3 的鏈?zhǔn)剿季S推理相結(jié)合，我們建立了寶貴的冷啟動(dòng)推理數(shù)據(jù)，將非形式化的數(shù)學(xué)思維與形式化證明結(jié)構(gòu)連接起來。隨后的強(qiáng)化學(xué)習(xí)階段極大地加強(qiáng)了這種聯(lián)系，顯著提升了形式化定理證明的能力。由此產(chǎn)生的模型 DeepSeek-Prover-V2-671B 在一系列基準(zhǔn)測(cè)試中始終優(yōu)于所有基線模型，涵蓋了高中競賽題和大學(xué)水平的數(shù)學(xué)題。我們未來的工作將致力于將這一范例擴(kuò)展為類似 AlphaProof 的系統(tǒng)，最終目標(biāo)是解決代表自動(dòng)化定理證明挑戰(zhàn)前沿的國際數(shù)學(xué)奧林匹克競賽級(jí)別的數(shù)學(xué)問題。