Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Correctness Prediction on Model-Query Evaluation (112 language models, 10 public benchmarks) (test)
Loading...
70.12
Accuracy (Prediction)
IRT-NET
67.2184
67.9717
68.725
69.4783
Jan 28, 2026
Accuracy (Prediction)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy (Prediction)
IRT-NET
Number of evaluation q...
2026.01
70.12
LOCUS
Number of evaluation q...
2026.01
70.03
EMBEDLLM
Number of evaluation q...
2026.01
69.47
IRT-NET
Number of evaluation q...
2026.01
69.07
LOCUS
Number of evaluation q...
2026.01
68.33
LOCUS
Number of evaluation q...
2026.01
68.31
EMBEDLLM
Number of evaluation q...
2026.01
68.12
IRT-NET
Number of evaluation q...
2026.01
67.38
EMBEDLLM
Number of evaluation q...
2026.01
67.33
Feedback
Search any
task
Search any
task