Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AQUA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningAQUA
Accuracy85.05
132
Arithmetic ReasoningAQuA (test)
Accuracy74.63
58
Hallucination DetectionAQuA
AUROC0.7822
31
Multiple-choice Question AnsweringAQuA
Accuracy87.4
31
Arithmetic ReasoningAQUA
Accuracy77.1
31
Symbolic ReasoningAQUA
Accuracy80.3
26
Algebraic ReasoningAQUA
Accuracy79.1
15
Mathematical ReasoningAQUA (test)
Accuracy72.44
6
Arithmetic ReasoningAQUA
Accuracy (format-specific prompt)33.5
2
Algebraic ReasoningAQUA (test)
Accuracy-
0
Showing 10 of 10 rows