Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BBQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Bias EvaluationBBQ
Accuracy99.3
113
Question AnsweringBBQ Gender
Accuracy82.4
36
Question-AnsweringBBQ
Accuracy92.12
36
ReasoningBBQ (test)
Accuracy (Reasoning BBQ)97.5
32
Question AnsweringBBQ (test)
Accuracy (amb)98.86
20
Question AnsweringBBQ Race
Accuracy81.9
18
Question AnsweringBBQ Nationality
Accuracy82.1
18
Bias EvaluationBBQ averaged across gender, nationality, and religion domains
Accuracy (Ambiguous)87.73
16
Question AnsweringBBQ (Bias Benchmark for QA) v1.0 (test)
BBQ SES Score93.1
16
Bias MitigationBBQ SingleTurn
Age Bias16.3
12
Question AnsweringBBQ
Disambiguation TOP-183.93
12
Question AnsweringD_BBQ
Accuracy99.5
8
Question AnsweringBBQ Overall Llama-3
Accuracy80.7
6
Question AnsweringBBQ disambiguated questions
Accuracy93
5
Question AnsweringBBQ (ambiguous)
Accuracy95
5
Question Answering Bias EvaluationBBQ
Accuracy (All)79
5
Bias EvaluationBBQ Gender
Ambiguity Score47.2
4
Bias QABBQ Ambig
Accuracy85.04
4
Bias QABBQ Disambig
Accuracy84.85
2
Bias EvaluationBBQ Disambiguated
Bias Score Before90.07
1
Showing 20 of 20 rows