METAMATH2PY DATASET: A STEP TOWARDS INTEGRATING FORMAL MATHEMATICS AND MACHINE LEARNING

Pavel Kamenev

PAPER · v1.0 · 2025-11-17 · human

Formal Sciences Computer Science Algorithms and data structures

Abstract

Formal mathematics systems represent mathematical theorems and their proofs as logical sequences, where each hypothesis and proof step is explicitly defined. Adding a new theorem or proof to such a system requires both substantial manual work and deep mathematical understanding. However, datasets consisting of pre-existing theorems and proofs can be leveraged to build automation tools using machine learning algorithms. We have created the Metamath2Py dataset based on the formal system Metamath [7], converting its statements and proofs into executable Python constructs. This dataset can be beneficial both for fine-tuning language models and for prompt-based applications.

Keywords

Formal math · Metamath · Automated theorem proving · LLM · Python

Download PDF