Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BoolQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringBoolQA
Mean Per-Step Regret0.185
15
Question AnsweringBoolQA
Accuracy67.3
3
Showing 2 of 2 rows