Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BBH (Big-Bench-Hard)

Benchmarks

Task NameDataset NameSOTA ResultTrend
General ReasoningBBH (Big-Bench-Hard) (test)
Accuracy81.8
20
Showing 1 of 1 rows