Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TreeBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Visual Grounded ReasoningTreeBench
Overall Score54.8
128
PerceptionTWI-oriented TreeBench online setting
Accuracy70.5
16
ReasoningTWI-oriented TreeBench online setting
Accuracy37.1
16
PerceptionTWI-oriented TreeBench (offline)
Accuracy71.1
12
ReasoningTreeBench TWI-oriented (offline)
Accuracy37.1
12
Visual PerceptionTreeBench
Score45.2
9
Visual GroundingTreeBench
Error Rate24.6
6
Showing 7 of 7 rows