General Language Modeling on BIG-Bench (test)

83.6Accuracy

Best Model

Updated 2mo ago

Evaluation Results

Method	Links
Best Model 2025.10		83.6	5	-14.4
Best Model 2025.10		76.4	9	-28