Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BBQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Bias EvaluationBBQ
Accuracy99.3
171
Question AnsweringBBQ Gender
Accuracy82.4
36
Question-AnsweringBBQ
Accuracy92.12
36
ReasoningBBQ (test)
Accuracy (Reasoning BBQ)97.5
32
Question AnsweringBBQ (test)
Accuracy (amb)98.86
20
Question AnsweringBBQ Race
Accuracy81.9
18
Question AnsweringBBQ Nationality
Accuracy82.1
18
Bias Evaluation (Religion)BBQ Religion (test)
Accuracy (ACC)80.5
16
Bias Evaluation (Age)BBQ Age (test)
Accuracy92.5
16
Bias EvaluationBBQ averaged across gender, nationality, and religion domains
Accuracy (Ambiguous)87.73
16
Question AnsweringBBQ (Bias Benchmark for QA) v1.0 (test)
BBQ SES Score93.1
16
Bias MitigationBBQ SingleTurn
Age Bias16.3
12
Question AnsweringBBQ
Disambiguation TOP-183.93
12
Abstention in Question AnsweringBBQ Underspecified Intent
Abstention F191.2
10
Bias EvaluationBBQ
Steerability Score1.84
9
Question AnsweringD_BBQ
Accuracy99.5
8
Question AnsweringBBQ Overall Llama-3
Accuracy80.7
6
Question AnsweringBBQ Unambiguous Questions
Accuracy94
5
Question AnsweringBBQ Ambiguous Questions
Accuracy97
5
Question AnsweringBBQ disambiguated questions
Accuracy93
5
Question AnsweringBBQ (ambiguous)
Accuracy95
5
Question Answering Bias EvaluationBBQ
Accuracy (All)79
5
Bias EvaluationBBQ Gender
Ambiguity Score47.2
4
Bias QABBQ Ambig
Accuracy85.04
4
Bias QABBQ Disambig
Accuracy84.85
2
Showing 25 of 26 rows