MiniF2F & Open-Source Lean4 Theorem Proving
Benchmark context for developers searching for an open-source Lean4 theorem prover with strong MiniF2F-Test scores.
What is MiniF2F?
MiniF2F is a standard benchmark for neural theorem proving: a collection of formalized olympiad-style problems checked by proof assistants (typically Lean). MiniF2F-Test is the held-out split used to compare prover models fairly.
Unlike math QA benchmarks that only require a numeric answer, MiniF2F requires a complete, machine-verifiable proof in Lean4.
LongCat-Flash-Prover on MiniF2F-Test
LongCat-Flash-Prover is an open-source Lean4 theorem prover from Meituan LongCat. With Tool-Integrated Reasoning (TIR) and a 72-attempt budget, it reports:
- 97.1% pass rate on MiniF2F-Test
- 100% auto-formalization on MiniF2F-Test & ProofNet
- 46.7% on MathOlympiad-Bench (180-attempt budget)
- 41.5% on PutnamBench (118-attempt budget)
The model decomposes proving into auto-formalization, sketching, and whole-proof generation with Lean4 server verification.
Why Lean4 for open-source provers?
Lean4 provides step-level verification: every proof line is checked by the kernel. Open-source prover stacks typically pair a language model with a Lean4 server, search, and TIR feedback loops.
Scores depend on attempt budget, sampling, and verifier configuration — always cite the evaluation protocol when comparing models.