Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SPBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Spatial ReasoningSPBench
SPBench Score55.9
14
Spatial ReasoningSPBench (test)
SI Score70.2
14
Spatial ReasoningSPBench SI, MV 35
Accuracy70.6
11
Virtual Standardized Patient simulationSPBench
QC Score97.17
9
Spatial ReasoningSPBench SI
Accuracy54.2
9
Personality SteeringSPBench
Agreeableness (A) Mean Score9.67
6
Showing 6 of 6 rows