Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TheoremQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Scientific ReasoningTheoremQA
Accuracy82.49
68
Mathematical ReasoningTheoremQA
Accuracy49.25
64
Theorem-based ReasoningTheoremQA
Score53
34
Reasoning Quality AssessmentTheoremQA
AUROC0.873
32
Science and Engineering Question AnsweringTheoremQA
Accuracy68.04
31
PhysicsTheoremQA
Accuracy58.8
28
Mathematical ReasoningTheoremQA (test)
Accuracy48.4
28
Targeted error generationTheoremQA Tier-1 (first-20 sweep)
Targeted Error Rate54
27
Mathematical ReasoningTheoremQA
Pass@124.7
18
Mathematical ReasoningTheoremQA
Pass@134.1
18
STEM ReasoningTheoremQA
Avg@255.4
16
Question AnsweringTheoremQA
Accuracy15
16
Mathematical ReasoningTheoremQA
ThmQA Score57.88
15
ReasoningTheoremQA
AUROC88.87
14
Theorem-based Question AnsweringTheoremQA
Accuracy86.32
13
Theorem ProvingTheoremQA
Accuracy13.5
13
STEM Theorem Question AnsweringTheoremQA
Acceptance Length4.4
12
Mathematical Problem SolvingTheoremQA TQ-Math
Exact Match Accuracy57.7
12
Retrieval-Augmented GenerationTheoremQA
Accuracy66.3
12
Theorem Question AnsweringTheoremQA standard (test)
Accuracy56
12
Skill retrievalTheoremQA
nDCG@177.4
11
CodingTheoremQA
Accuracy55.38
10
General ReasoningTheoremQA
Accuracy (General Reasoning)32.47
9
Scientific ReasoningTheoremQA (test)
Accuracy48.4
9
Theorem Question AnsweringTheoremQA (test)
Accuracy87.4
8
Showing 25 of 29 rows