Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SPBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Spatial ReasoningSPBench SI
Accuracy58.6
42
Multi-image spatial reasoningSPBench-MV
Accuracy80.4
19
Spatial ReasoningSPBench MV
NQ Score68.2
14
Spatial ReasoningSPBench
SPBench Score55.9
14
Spatial ReasoningSPBench (test)
SI Score70.2
14
Spatial ReasoningSPBench SI, MV 35
Accuracy70.6
11
Virtual Standardized Patient simulationSPBench
QC Score97.17
9
Personality SteeringSPBench
Agreeableness (A) Mean Score9.67
6
Showing 8 of 8 rows