Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Human-Metric Correlation on HelpSteer2 (In-Distribution)
Loading...
0.342
Kendall's Tau
AutoMetrics
0.02688
0.10869
0.1905
0.27231
Dec 19, 2025
Kendall's Tau
Updated 4d ago
Evaluation Results
Method
Method
Links
Kendall's Tau
AutoMetrics
Backbone=Qwen-3-32B
2025.12
0.342
LLM-Judge
Backbone=Qwen-3-32B
2025.12
0.334
Best Existing Metric
Backbone=Model Agnostic
2025.12
0.327
AutoMetrics
Backbone=GPT-4o-mini
2025.12
0.324
DnA Eval
Backbone=Qwen-3-32B
2025.12
0.26
LLM-Judge
Backbone=GPT-4o-mini
2025.12
0.259
DnA Eval
Backbone=GPT-4o-mini
2025.12
0.255
MetaMetrics
Backbone=Model Agnostic
2025.12
0.204
Finetuned LLM
Backbone=Model Agnostic
2025.12
0.039
Feedback
Search any
task
Search any
task