Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Language Modeling and Reasoning on BigBench Composite (Lamb, SQuAD, CoQA, BBH, LSAT, LangID)

24Avg Score

KromHC

17.03218.84120.6522.459Jan 29, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
2430.48.215.440.444.611.913.626.125
2026.01
23.729.210.813.839.238.812.915.827.825.4
2026.01
23.3308.414.236.642.614.76102726.2
2026.01
22.931.65.81339.64216.71320.424
2026.01
19.5190.25.81440.88.611.427.827.8
2026.01
18.819.60.44.619.438.25.76.429.628
2026.01
18.118.605.62040.89.14.623.523.4
2026.01
17.317.404.810.6425.29.223.526