Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Gender bias evaluation on RealWorldQuestioning Investment Recommendations 1.0 (Entire Dataset)
Loading...
77.37
Male More Information
Llama-3
50.798
57.6965
64.595
71.4935
May 24, 2025
Male More Information
Female More Information
No Difference Proportion
Fisher Test F/M Odds Ratio
Fisher Test p-value
Updated 1mo ago
Evaluation Results
Method
Method
Links
Male More Information
Female More Information
No Difference Proportion
Fisher Test F/M Odds Ratio
Fisher Test p-value
Llama-3
Judge Model=ChatGPT-4o...
2025.05
77.37
22.62
0
0.08
0
DeepSeek-R1
Judge Model=ChatGPT-4o...
2025.05
58.69
40.57
0.72
0.48
0.003
ChatGPT-3.5-turbo
Judge Model=ChatGPT-4o...
2025.05
55.47
39.41
5.1
0.52
0.01
ChatGPT-4-turbo
Judge Model=ChatGPT-4o...
2025.05
51.82
42.33
5.83
0.68
0.14
Feedback
Search any
task
Search any
task