Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Solution Simulation on Student_15 v1.0 (test)
Loading...
3.65
Con2
Refine
1.622
2.1485
2.675
3.2015
May 26, 2025
Con2
Updated 4d ago
Evaluation Results
Method
Method
Links
Con2
Refine
Model=GPT-4o, Behavior...
2025.05
3.65
CoT
Model=GPT-4o, Behavior...
2025.05
3.5
Refine
Model=GPT-3.5, Behavio...
2025.05
3.49
Refine
Model=GPT-3.5, Behavio...
2025.05
3.39
CoT
Model=GPT-3.5, Behavio...
2025.05
3.35
IO
Model=GPT-3.5, Behavio...
2025.05
3.34
IO
Model=GPT-4o, Behavior...
2025.05
3.32
Refine
Model=Claude-3.5-Sonne...
2025.05
2.99
Refine
Model=LLaMA-3.3-70B-In...
2025.05
2.69
IO
Model=Claude-3.5-Sonne...
2025.05
2.57
Refine
Model=Claude-3.5-Sonne...
2025.05
2.53
CoT
Model=Claude-3.5-Sonne...
2025.05
2.43
Refine
Model=LLaMA-3.3-70B-In...
2025.05
1.8
CoT
Model=LLaMA-3.3-70B-In...
2025.05
1.79
IO
Model=LLaMA-3.3-70B-In...
2025.05
1.7
Feedback
Search any
task
Search any
task