Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ZebraArena

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agentic ReasoningZebraArena multi-turn Large
Accuracy82.24
2
Agentic ReasoningZebraArena multi-turn Medium
Accuracy88.14
2
Agentic ReasoningZebraArena multi-turn (Small)
Accuracy96.69
2
Showing 3 of 3 rows