Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Decision Making on (test)
Loading...
14.2
ACC1 Score
4o
0.992
4.421
7.85
11.279
Dec 19, 2024
ACC1 Score
Delta Accuracy
Symbolic Result (√ → X)
Updated 4d ago
Evaluation Results
Method
Method
Links
ACC1 Score
Delta Accuracy
Symbolic Result (√ → X)
4o
Model Family=ChatGPT
2024.12
14.2
20.9
76.6
3.5-turbo
Model Family=ChatGPT
2024.12
7.5
5.2
76.5
o1-mini
Model Family=ChatGPT
2024.12
1.5
8.2
92.3
Feedback
Search any
task
Search any
task