Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Human Evaluation on 50 randomly selected model responses
Loading...
98
Clarity
GPT-4.1
68.88
76.44
84
91.56
Nov 28, 2025
Clarity
Relevance
Cultural Appropriateness
Updated 4d ago
Evaluation Results
Method
Method
Links
Clarity
Relevance
Cultural Appropriateness
GPT-4.1
2025.11
98
96
98
Gemma-3-4B-it
2025.11
84
82
82
Phi-3-mini-4k-instruct
2025.11
70
74
64
Feedback
Search any
task
Search any
task