Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MultiArith

Benchmarks

Task NameDataset NameSOTA ResultTrend
Arithmetic ReasoningMultiArith
Accuracy100
293
Mathematical ReasoningMultiArith
Accuracy100
143
Arithmetic ReasoningMultiArith (test)
Accuracy99.3
115
Math ReasoningMultiArith
Accuracy98.3
65
Math reasoningMultiArith (test)
Accuracy99.59
54
Mathematical ReasoningMultiArith
Original Accuracy99
40
Mathematical reasoningMultiArith Out of Distribution
Top-1 Accuracy (Maj@1)100
30
Group Collusive Attack DetectionMultiArith
Detection Accuracy92
27
Question AnsweringMultiArith
Accuracy74.3
24
mathematical reasoningMultiArith
Accuracy99.15
19
Mathematical ReasoningMultiArith
Pass@1100
18
Mathematical ReasoningMultiArith
Accuracy100
16
Mathematical ReasoningMultiArith
Accuracy95
15
Math ReasoningMultiArith
Accuracy98.3
14
Follow-up Questioning ConsistencyMultiArith (unseen)
Average Success Count18.33
12
Mathematical ReasoningMultiArith
Accuracy43.33
10
Mathematical ReasoningMultiArith
Accuracy (Clean)99.44
8
Mathematical ReasoningMultiArith (val)
Tokens at Best Step (K)1,640
7
Mathematical ReasoningMultiArith OOD
Base Accuracy (CA)100
2
Arithmetic ReasoningMultiArith
Accuracy (format-specific prompt)78.7
2
Mathematical ReasoningMultiArith
Initial Accuracy96.67
1
Mathematical ReasoningMultiArith
Detection Rate93.4
1
Showing 22 of 22 rows