Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language Reasoning on BBH (BIG-Bench Hard)

87.5Average BBH Score

PREFPO

-3.011220.486943.98567.4831May 19, 2025Jul 7, 2025Aug 26, 2025Oct 15, 2025Dec 3, 2025Jan 22, 2026Mar 13, 2026
Updated 20d ago

Evaluation Results

MethodLinks
2026.03
87.597.885.976.191.581.891.796.991.973.5------
2026.03
87.399.495.675.189.977.690.299.986.271.4------
2026.03
8798.794.877.890.369.892.197.391.271.4------
2026.03
8598.791.573.276.57891.194.689.671.5------
2026.03
83.997.177.476.480.767.791.596.692.974.9------
2026.03
82.999.289.672.767.685.6809686.868.8------
2026.03
759.282.470.6827210078.49882------
2026.03
71.374.873.272.264.454.868.877.287.268.8------
2025.05
47.3---------------
2025.05
24.7---------------
2025.11
24.2823.63-47.59---16.225.29--24.2824.5221.8717.6117.51
2025.05
21.6---------------
2025.05
20.3---------------
2025.05
19.9---------------
2025.11
2.082.36-1.98---1.861.77-2.08-2.871.861.622.32
2025.11
2.032.06-1.91---1.991.85-2.03-1.831.892.382.29
2025.11
1.871.75-2.07---1.671.65-1.87-1.731.971.932.22
2025.11
1.84.78-0.99---1.240.67-1.8-2.271.711.910.83
2025.11
1.151.57-0.36---0.730.39-1.15-1.760.921.851.61
2025.11
0.470.41-0.32---0.720.44-0.47-0.370.40.380.76