Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Gender Bias Evaluation on RealWorldQuestioning Health Recommendations
Loading...
0.75
Shannon Entropy (T-test Statistic)
ChatGPT-3.5-turbo
-1.3716
-0.8208
-0.27
0.2808
May 24, 2025
Shannon Entropy (T-test Statistic)
Shannon Entropy (p-value)
CTTR (T-test Statistic)
CTTR (p-value)
Maas (T-test Statistic)
Maas (p-value)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Shannon Entropy (T-test Statistic)
Shannon Entropy (p-value)
CTTR (T-test Statistic)
CTTR (p-value)
Maas (T-test Statistic)
Maas (p-value)
ChatGPT-3.5-turbo
Iteration=1, Evaluatio...
2025.05
0.75
0.45
-0.49
0.61
1.21
0.22
ChatGPT-4-turbo
Iteration=1, Evaluatio...
2025.05
-0.11
0.9
-0.06
0.94
-0.44
0.65
Llama-3
Iteration=1, Evaluatio...
2025.05
-0.17
0.85
-1.07
0.28
-0.52
0.6
DeepSeek-R1
Iteration=1, Evaluatio...
2025.05
-1.29
0.19
-0.1
0.91
-0.74
0.45
Feedback
Search any
task
Search any
task