Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SPD-Faith Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Faithful ReasoningSPD-Faith Bench Multi-Difference 1.0 (test)
CR87.7
12
Faithful PerceptionSPD-Faith Bench Multi-Difference Subset 1.0 (test)
Precision (Type: Color)97.5
12
Global PerceptionSPD-Faith Bench Multi-Difference 1.0 (test)
DQR Score68.3
12
Multimodal ReasoningSPD-Faith Bench Hard 1.0
Contradiction Rate10.9
12
Multimodal ReasoningSPD-Faith Bench Medium 1.0
Contradiction Rate11.3
12
Multimodal ReasoningSPD-Faith Bench Easy 1.0
Contradiction Rate5
12
Showing 6 of 6 rows