Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Overall Evaluation on Principle-based evaluation dataset
Loading...
8.41
Average
Claude 4.5-Haiku
7.786
7.948
8.11
8.272
Dec 2, 2025
Average
Updated 4d ago
Evaluation Results
Method
Method
Links
Average
Claude 4.5-Haiku
Access Type=API-based
2025.12
8.41
Claude 4.5-Sonnet
Access Type=API-based
2025.12
8.41
GPT-5.1
Access Type=API-based
2025.12
8.36
Claude 3.5-Haiku
Access Type=API-based
2025.12
8.33
GPT-4.1
Access Type=API-based
2025.12
8.3
Qwen3-8B
Access Type=Open-source
2025.12
8.23
Qwen3-4B
Access Type=Open-source
2025.12
8.21
GPT-5
Access Type=API-based
2025.12
8.21
Qwen3-1.7B
Access Type=Open-source
2025.12
8.03
Qwen2.5-7B
Access Type=Open-source
2025.12
7.97
Llama-3.1-8B
Access Type=Open-source
2025.12
7.86
Llama-3.2-3B
Access Type=Open-source
2025.12
7.81
Feedback
Search any
task
Search any
task