Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Algebraic Reasoning on AQUA
Loading...
79.1
Accuracy
QAP50
23.8344
38.1822
52.53
66.8778
Jul 4, 2024
Oct 5, 2024
Jan 6, 2025
Apr 10, 2025
Jul 12, 2025
Oct 13, 2025
Jan 15, 2026
Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
QAP50
Model=GPT-4 Turbo, Con...
2024.07
79.1
Baseline
Model=GPT-4 Turbo
2024.07
78.7
TADB
Model=GPT-4 Turbo
2024.07
78.7
QAP150
Model=GPT-4 Turbo, Con...
2024.07
78
QAP25
Model=GPT-4 Turbo, Con...
2024.07
77.6
QAP200
Model=GPT-4 Turbo, Con...
2024.07
76.4
QAP100
Model=GPT-4 Turbo, Con...
2024.07
75.6
CoT
Model=GPT-4 Turbo
2024.07
74.4
PS+
Model=GPT-4 Turbo
2024.07
52.8
Qwen3-8B
Category=Naive LLM
2026.01
33.45
VIST2-8B
Category=Ours
2026.01
32.1
Qwen3-VL-8B
Category=Visual-enhanc...
2026.01
29.88
Qwen3-4B
Category=Naive LLM
2026.01
28.4
VIST2-4B
Category=Ours
2026.01
27.95
Qwen3-VL-4B
Category=Visual-enhanc...
2026.01
25.96
Feedback
Search any
task
Search any
task