Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Gender bias evaluation on RealWorldQuestioning Health Recommendations 1.0
Loading...
80.89
Proportion (Male More Info)
Llama-3
52.8516
60.1308
67.41
74.6892
May 24, 2025
Proportion (Male More Info)
Proportion (Female More Info)
Proportion (No Difference)
Odds Ratio (F/M)
p-value (Fisher Test)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Proportion (Male More Info)
Proportion (Female More Info)
Proportion (No Difference)
Odds Ratio (F/M)
p-value (Fisher Test)
Llama-3
Judge Model=ChatGPT-4o...
2025.05
80.89
17.97
1.12
0.05
0
DeepSeek-R1
Judge Model=ChatGPT-4o...
2025.05
73.03
26.96
0
0.13
0
ChatGPT-4-turbo
Judge Model=ChatGPT-4o...
2025.05
56.17
38.2
5.61
0.48
0.02
ChatGPT-3.5-turbo
Judge Model=ChatGPT-4o...
2025.05
53.93
39.32
6.74
0.55
0.07
Feedback
Search any
task
Search any
task