Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Language Modeling Evaluation on Aggregate Wino Hella ARC-E ARC-C MMLU
Loading...
56.67
Average Accuracy
Teacher (GPT-OSS-20B)
28.954
36.1495
43.345
50.5405
May 27, 2026
Average Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Average Accuracy
Teacher (GPT-OSS-20B)
Evaluation Mode=comple...
2026.05
56.67
DO-ACP
Scoring=DO-ACP, K=4
2026.05
33.71
ACP
Scoring=ACP, K=4
2026.05
33.36
CP
Scoring=CP, K=4
2026.05
32.86
ACP
Scoring=ACP, K=8
2026.05
32.82
SF
Scoring=SF, K=4
2026.05
32.15
DO-ACP
Scoring=DO-ACP, K=8
2026.05
32.11
SF
Scoring=SF, K=8
2026.05
31.72
CP
Scoring=CP, K=8
2026.05
31.49
Random FFN + teacher attn
2026.05
30.46
Random initialization
2026.05
30.02
Feedback
Search any
task
Search any
task