Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TheoremQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningTheoremQA
Accuracy43.1
55
Theorem-based ReasoningTheoremQA
Score53
34
Reasoning Quality AssessmentTheoremQA
AUROC0.873
32
PhysicsTheoremQA
Accuracy58.8
28
Mathematical ReasoningTheoremQA (test)
Accuracy48.4
28
Mathematical ReasoningTheoremQA
Pass@124.7
18
Mathematical ReasoningTheoremQA
Pass@134.1
18
STEM ReasoningTheoremQA
Avg@255.4
16
Question AnsweringTheoremQA
Accuracy15
16
ReasoningTheoremQA
AUROC88.87
14
Theorem ProvingTheoremQA
Accuracy13.5
13
Mathematical Problem SolvingTheoremQA TQ-Math
Exact Match Accuracy57.7
12
Retrieval-Augmented GenerationTheoremQA
Accuracy66.3
12
Theorem Question AnsweringTheoremQA standard (test)
Accuracy56
12
Scientific ReasoningTheoremQA
Accuracy42.3
11
Scientific ReasoningTheoremQA (test)
Accuracy48.4
9
STEM ReasoningTheoremQA
Accuracy36.8
8
General ReasoningTheoremQA
Average@236.3
7
Theorem-based Question AnsweringTheoremQA
Accuracy56.13
7
Alpha-law validationTheoremQA
Alpha1.312
2
Showing 20 of 20 rows