Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Correctness Prediction on Model-Query Evaluation (112 language models, 10 public benchmarks) (test)

70.12Accuracy (Prediction)

IRT-NET

67.218467.971768.72569.4783Jan 28, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
70.12
2026.01
70.03
2026.01
69.47
2026.01
69.07
2026.01
68.33
2026.01
68.31
2026.01
68.12
2026.01
67.38
2026.01
67.33