Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Human Expert Alignment on FeedEval
Loading...
83.2
Specificity Accuracy
Gemma3-Inst.
72.488
75.269
78.05
80.831
Jan 8, 2026
Specificity Accuracy
Specificity F1 Score
Helpfulness Accuracy
Helpfulness F1 Score
Validity Accuracy
Validity F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Specificity Accuracy
Specificity F1 Score
Helpfulness Accuracy
Helpfulness F1 Score
Validity Accuracy
Validity F1 Score
Gemma3-Inst.
Model Scale=3B-scale,...
2026.01
83.2
89.3
85.3
89.5
82.2
70.2
Llama3-3B-Inst.
Model Scale=3B-scale,...
2026.01
82
88
86.4
91.2
83.5
70.9
Phi-3-Mini
Model Scale=3B-scale,...
2026.01
81.1
86
87.1
92
82
70
Qwen2-3B-Inst.
Model Scale=3B-scale,...
2026.01
80.7
87
75.5
82.4
83.3
70.3
Gemini-2.5-Pro
Fine-tuning Status=frozen
2026.01
75.7
84.5
55.6
62.2
61.3
41.7
GPT-5.1
Fine-tuning Status=frozen
2026.01
72.9
83.3
58.4
69.7
64
44.5
Feedback
Search any
task
Search any
task