Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Annotator agreement on SPS (test)
Loading...
57.74
Krippendorff's Alpha
slm-judge
-1.2592
14.0579
29.375
44.6921
Apr 1, 2026
Krippendorff's Alpha
Updated 2mo ago
Evaluation Results
Method
Method
Links
Krippendorff's Alpha
slm-judge
Backbone=Qwen3-1.7B, P...
2026.04
57.74
Early stopping without augmentation
Backbone=Qwen3-1.7B, D...
2026.04
43.8
Full finetuning without augmentation
Backbone=Qwen3-1.7B, D...
2026.04
43.04
LoRA dropout
Backbone=Qwen3-1.7B, p...
2026.04
40.67
GPT-5-mini-2025-08-07
Prompting Strategy=Zer...
2026.04
24.62
GPT-5.2-chat
Prompting Strategy=Zer...
2026.04
20.54
GPT-4o
Prompting Strategy=Zer...
2026.04
19.64
GPT-5-nano
Prompting Strategy=Zer...
2026.04
10.65
GPT-5.2-chat
Prompting Strategy=Few...
2026.04
4.71
GPT-4o
Prompting Strategy=Few...
2026.04
1.01
Feedback
Search any
task
Search any
task