Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RealWorldQuestioning

Benchmarks

Task NameDataset NameSOTA ResultTrend
Gender Bias EvaluationRealWorldQuestioning Health Recommendations
Shannon Entropy (T-test Statistic)0.75
4
Gender Bias EvaluationRealWorldQuestioning Investment Recommendations
Shannon Entropy (T-test Statistic)0.9
4
Gender Bias EvaluationRealWorldQuestioning Jobs Recommendations
Shannon Entropy (T-statistic)1.44
4
Gender Bias EvaluationRealWorldQuestioning Education Recommendations
Shannon Entropy (T-stat)2
4
Gender bias evaluationRealWorldQuestioning Health Recommendations 1.0
Proportion (Male More Info)80.89
4
Gender bias evaluationRealWorldQuestioning Investment Recommendations 1.0 (Entire Dataset)
Male More Information77.37
4
Gender bias evaluationRealWorldQuestioning Jobs Recommendations 1.0
Male More Information70.54
4
Gender bias evaluationRealWorldQuestioning Education Recommendations 1.0 (Entire Dataset)
Proportion Male More Information69.62
4
Showing 8 of 8 rows