Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Utility Evaluation on Anchor Utility Dataset
Loading...
5.24
Anchor-PPL
CDA
-2.1048
47.4726
97.05
146.6274
Mar 19, 2026
Anchor-PPL
Updated 29d ago
Evaluation Results
Method
Method
Links
Anchor-PPL
CDA
Model Family=QWEN2.5-3B
2026.03
5.24
CDA
Model Family=QWEN2.5-7B
2026.03
5.52
CDA
Model Family=GEMMA-2-2B
2026.03
5.94
CDA
Model Family=QWEN2.5-14B
2026.03
6.6
KLAAD
Model Family=GEMMA-2-2B
2026.03
20.36
KLAAD
Model Family=QWEN2.5-7B
2026.03
21.81
KLAAD
Model Family=QWEN2.5-3B
2026.03
23.44
KLAAD
Model Family=QWEN2.5-14B
2026.03
23.83
UGID
Model Family=QWEN2.5-3B
2026.03
101.25
ORIGINAL
Model Family=QWEN2.5-3B
2026.03
114.39
UGID
Model Family=QWEN2.5-14B
2026.03
153.71
ORIGINAL
Model Family=GEMMA-2-2B
2026.03
154.79
UGID
Model Family=GEMMA-2-2B
2026.03
165.63
ORIGINAL
Model Family=QWEN2.5-14B
2026.03
170.31
ORIGINAL
Model Family=QWEN2.5-7B
2026.03
173.25
UGID
Model Family=QWEN2.5-7B
2026.03
188.86
Feedback
Search any
task
Search any
task