Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TIR-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal Agent TaskTIR-Bench
Average@447.75
24
Multimodal Tool-UseTIR-Bench
Avg@450
16
Visual ReasoningTIR-Bench
Average Score51.8
15
Visual NavigationTIR-Bench Maze
Accuracy65
9
Tool-Integrated ReasoningTIR-Bench
Score20.8
4
Agentic ReasoningTIR-Bench
Accuracy19.8
3
Showing 6 of 6 rows