Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Human Evaluation on Human Evaluation
Loading...
0.86
Trustworthiness
Ann Brown
0.8184
0.8292
0.84
0.8508
Feb 21, 2026
Trustworthiness
Self-Awareness
Real-World Preference
Updated 4d ago
Evaluation Results
Method
Method
Links
Trustworthiness
Self-Awareness
Real-World Preference
Ann Brown
baseline_comparison=vs...
2026.02
0.86
0.894
0.784
Ann Brown
baseline_comparison=vs...
2026.02
0.841
0.842
0.8
Ann Brown
baseline_comparison=vs...
2026.02
0.838
0.792
0.829
Ann Brown
baseline_comparison=vs...
2026.02
0.82
0.833
0.784
Feedback
Search any
task
Search any
task