Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Human-Metric Correlation on SimpEval In-Distribution
Loading...
0.321
Kendall's Tau
AutoMetrics
0.03084
0.10617
0.1815
0.25683
Dec 19, 2025
Kendall's Tau
Updated 4d ago
Evaluation Results
Method
Method
Links
Kendall's Tau
AutoMetrics
Backbone=GPT-4o-mini
2025.12
0.321
AutoMetrics
Backbone=Qwen-3-32B
2025.12
0.316
LLM-Judge
Backbone=Qwen-3-32B
2025.12
0.294
LLM-Judge
Backbone=GPT-4o-mini
2025.12
0.272
Best Existing Metric
Backbone=Model Agnostic
2025.12
0.246
DnA Eval
Backbone=GPT-4o-mini
2025.12
0.234
MetaMetrics
Backbone=Model Agnostic
2025.12
0.127
Finetuned LLM
Backbone=Model Agnostic
2025.12
0.076
DnA Eval
Backbone=Qwen-3-32B
2025.12
0.042
Feedback
Search any
task
Search any
task