Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language Modeling on HELM macro-averaged (test)

73.2Accuracy

Zero-Shot Predict

46.78453.64260.567.358Nov 25, 2025
Updated 17d ago

Evaluation Results

MethodLinks
2025.11
73.2
2025.11
73.1
2025.11
73.1
2025.11
72.7
2025.11
70.9
2025.11
69.8
2025.11
69.4
2025.11
69.3
2025.11
66.2
2025.11
66.2
2025.11
66.2
2025.11
65.9
2025.11
65.7
2025.11
65.3
2025.11
65.1
2025.11
64.8
2025.11
64.2
2025.11
64
2025.11
62.1
2025.11
61.7
2025.11
61.4
2025.11
61
2025.11
60.4
2025.11
60.3
2025.11
59.7
2025.11
59.2
2025.11
59
2025.11
57.2
2025.11
49.3
2025.11
47.8