Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language Modeling on Broad evaluation suite unseen S1 (dev)

74.2Average Accuracy

all-FA

56.5261.1165.770.29Apr 21, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
74.21001
2026.04
71.8972
2026.04
71.1962.9
2026.04
69.7944.8
2026.04
66.8906.2
2026.04
65.3886.1
2026.04
60.2816.9
2026.04
57.27710.7