Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Automated Theorem Proving on miniF2F
Loading...
31.97
Accuracy
KG-Prover
19.6044
22.8147
26.025
29.2353
Feb 4, 2025
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
KG-Prover
LLM Model=Llama 3.1 8B...
2025.02
31.97
KG-Prover
LLM Model=Claude 3.5 S...
2025.02
31.15
KG-Prover
LLM Model=Llama 3.3 70...
2025.02
30.74
KG-Prover
LLM Model=GPT 4o, Max...
2025.02
30.74
KG-Prover
LLM Model=o1-mini, Max...
2025.02
30.74
RAG
LLM Model=Claude 3.5 S...
2025.02
28.69
RAG
LLM Model=GPT 4o, Max...
2025.02
28.69
RAG
LLM Model=o1-mini, Max...
2025.02
28.28
KG-Prover
LLM Model=Deepseek R1,...
2025.02
28.28
Base
LLM Model=Llama 3.3 70...
2025.02
25
RAG
LLM Model=Llama 3.1 8B...
2025.02
24.59
RAG
LLM Model=Llama 3.3 70...
2025.02
24.59
Base
LLM Model=o1-mini, Max...
2025.02
23.77
Base
LLM Model=GPT 4o, Max...
2025.02
23.36
Base
LLM Model=Claude 3.5 S...
2025.02
22.95
RAG
LLM Model=Deepseek R1,...
2025.02
22.54
Base
LLM Model=Llama 3.1 8B...
2025.02
20.49
Base
LLM Model=Deepseek R1,...
2025.02
20.08
Feedback
Search any
task
Search any
task