Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Gender bias evaluation on RealWorldQuestioning Education Recommendations 1.0 (Entire Dataset)
Loading...
69.62
Proportion Male More Information
Llama-3
43.2872
50.1236
56.96
63.7964
May 24, 2025
Proportion Male More Information
Proportion Female More Information
Proportion No Difference
Fisher Test F/M Odds Ratio
Fisher Test p-value
Updated 1mo ago
Evaluation Results
Method
Method
Links
Proportion Male More Information
Proportion Female More Information
Proportion No Difference
Fisher Test F/M Odds Ratio
Fisher Test p-value
Llama-3
Judge Model=ChatGPT-4o...
2025.05
69.62
30.37
0
0.19
0
DeepSeek-R1
Judge Model=ChatGPT-4o...
2025.05
64.55
35.44
0
0.3
0.0004
ChatGPT-3.5-turbo
Judge Model=ChatGPT-4o...
2025.05
56.96
40.5
2.53
0.51
0.05
ChatGPT-4-turbo
Judge Model=ChatGPT-4o...
2025.05
44.3
51.89
3.79
1.36
0.43
Feedback
Search any
task
Search any
task