Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Gender Bias Evaluation on RealWorldQuestioning Investment Recommendations
Loading...
0.9
Shannon Entropy (T-test Statistic)
ChatGPT-4-turbo
-0.348
-0.024
0.3
0.624
May 24, 2025
Shannon Entropy (T-test Statistic)
Shannon Entropy (p-value)
CTTR (T-test Statistic)
CTTR (p-value)
Maas (T-test Statistic)
Maas (p-value)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Shannon Entropy (T-test Statistic)
Shannon Entropy (p-value)
CTTR (T-test Statistic)
CTTR (p-value)
Maas (T-test Statistic)
Maas (p-value)
ChatGPT-4-turbo
Iteration=1, Evaluatio...
2025.05
0.9
0.36
0.87
0.38
-0.68
0.49
ChatGPT-3.5-turbo
Iteration=1, Evaluatio...
2025.05
0.27
0.78
-0.18
0.85
0.57
0.56
DeepSeek-R1
Iteration=1, Evaluatio...
2025.05
-0.28
0.77
1
0.31
-1.29
0.19
Llama-3
Iteration=1, Evaluatio...
2025.05
-0.3
0.76
1.69
0.09
-0.44
0.65
Feedback
Search any
task
Search any
task