Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GPT-4o responses

Benchmarks

Task NameDataset NameSOTA ResultTrend
HonestyGPT-4o-mini responses (Honesty)
Win Rate (GaaA)68.79
3
RobustnessGPT-4o-mini responses Robustness
GaaA Win Rate63.11
3
Showing 2 of 2 rows