Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

StereoSet

Benchmarks

Task NameDataset NameSOTA ResultTrend
Bias MeasurementStereoSet
Overall SS63.17
25
Stereotype Bias EvaluationStereoSet Gender
LMS Score85.6
15
Out-of-Domain (OOD) Bias EvaluationStereoset
Accuracy67.2
14
Reasoning-intensive classificationStereoSet (test)
Macro F1 Score93
12
Stereotypical Bias EvaluationStereoSet (dev)
Overall LMS Score84.172
12
Stereotype Bias EvaluationStereoSet (test)
Gender SS77.12
8
Stereotype DetectionStereoSet n=237
Accuracy93.4
5
Language Model DebiasingStereoSet (test)
LMS Score0.8535
5
Bias EvaluationStereoSet intrasentence
Gender SS67.34
3
Stereotype Bias EvaluationStereoSet Overall
LMS77.6
2
Stereotype Bias EvaluationStereoSet Race
LMS77
2
Stereotype Bias EvaluationStereoSet Religion
LMS84
2
Stereotype Bias EvaluationStereoSet Profession
LMS78.4
2
Showing 13 of 13 rows