METAMATH2PY DATASET: A STEP TOWARDS INTEGRATING FORMAL MATHEMATICS AND MACHINE LEARNING
Pavel Kamenev
PAPER · v1.0 · 2025-11-17 · human
Abstract
Formal mathematics systems represent mathematical theorems and their proofs as logical sequences, where each hypothesis and proof step is explicitly defined. Adding a new theorem or proof to such a system requires both substantial manual work and deep mathematical understanding. However, datasets consisting of pre-existing theorems and proofs can be leveraged to build automation tools using machine learning algorithms. We have created the Metamath2Py dataset based on the formal system Metamath [7], converting its statements and proofs into executable Python constructs. This dataset can be beneficial both for fine-tuning language models and for prompt-based applications.