Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Automated Theorem Proving on MUSTARDSAUCE
Loading...
34
Accuracy
KG-Prover
19.44
23.22
27
30.78
Feb 4, 2025
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
KG-Prover
LLM Model=o1-mini, Max...
2025.02
34
KG-Prover
LLM Model=Llama 3.3 70...
2025.02
32.5
KG-Prover
LLM Model=Claude 3.5 S...
2025.02
30
KG-Prover
LLM Model=GPT 4o, Max...
2025.02
30
RAG
LLM Model=Llama 3.3 70...
2025.02
28.8
RAG
LLM Model=Claude 3.5 S...
2025.02
28.4
Base
LLM Model=Claude 3.5 S...
2025.02
28
Base
LLM Model=GPT 4o, Max...
2025.02
28
RAG
LLM Model=Llama 3.1 8B...
2025.02
28
RAG
LLM Model=GPT 4o, Max...
2025.02
28
KG-Prover
LLM Model=Llama 3.1 8B...
2025.02
27.6
KG-Prover
LLM Model=Deepseek R1,...
2025.02
27
RAG
LLM Model=o1-mini, Max...
2025.02
26.8
Base
LLM Model=Llama 3.3 70...
2025.02
25.6
RAG
LLM Model=Deepseek R1,...
2025.02
25
Base
LLM Model=o1-mini, Max...
2025.02
24.8
Base
LLM Model=Llama 3.1 8B...
2025.02
24
Base
LLM Model=Deepseek R1,...
2025.02
20
Feedback
Search any
task
Search any
task