Share your thoughts, 1 month free Claude Pro on usSee more

Language Modeling on HELM macro-averaged (test)

73.2Accuracy

Zero-Shot Predict

Updated 3mo ago

Evaluation Results

Method	Links
Zero-Shot Predict 2025.11		73.2
BFRS 2025.11		73.1
MIPROv2 2025.11		73.1
Zero-Shot CoT 2025.11		72.7
HELM Baseline 2025.11		70.9
MIPROv2 2025.11		69.8
Zero-Shot CoT 2025.11		69.4
BFRS 2025.11		69.3
Zero-Shot CoT 2025.11		66.2
BFRS 2025.11		66.2
MIPROv2 2025.11		66.2
BFRS 2025.11		65.9
Zero-Shot CoT 2025.11		65.7
MIPROv2 2025.11		65.3
Zero-Shot Predict 2025.11		65.1
HELM Baseline 2025.11		64.8
BFRS 2025.11		64.2
Zero-Shot CoT 2025.11		64
MIPROv2 2025.11		62.1
Zero-Shot Predict 2025.11		61.7
HELM Baseline 2025.11		61.4
HELM Baseline 2025.11		61
MIPROv2 2025.11		60.4
BFRS 2025.11		60.3
Zero-Shot Predict 2025.11		59.7
Zero-Shot Predict 2025.11		59.2
Zero-Shot CoT 2025.11		59
HELM Baseline 2025.11		57.2
Zero-Shot Predict 2025.11		49.3
HELM Baseline 2025.11		47.8