Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on BBQ Ambiguous Questions
Loading...
97
Accuracy
GPT-4o
61.64
70.82
80
89.18
Dec 21, 2024
Accuracy
P(Not Stereotype | Not Unknown)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
P(Not Stereotype | Not Unknown)
GPT-4o
2024.12
97
6
o1
2024.12
96
5
GPT-4o-mini
2024.12
89
13
o1-mini
2024.12
88
8
o1-preview
2024.12
63
37
Feedback
Search any
task
Search any
task