Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

R-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Action-relation hallucination evaluationR-Bench Instance
Accuracy75.86
25
Action-relation hallucination evaluationR-Bench Image
Accuracy83.5
25
Multimodal Relation ReasoningR-Bench
Accuracy85.05
20
Real-world UnderstandingR-Bench
Distance Error55.5
19
Multimodal Hallucination EvaluationR-Bench
Dis66.68
13
RobustnessR-Bench
R-Bench Dis Metric61.01
13
Spatial-relation hallucination detectionR-Bench Instance
Accuracy77.39
8
Spatial-relation hallucination detectionR-Bench Image
Accuracy81.13
8
Visual UnderstandingR-Bench (test)
MCQ (low)65.29
8
Visual ReasoningR-Bench-V Game
Accuracy20.7
5
Visual ReasoningR-Bench-V Physics
Accuracy71.3
5
Relational Hallucination EvaluationR-Bench
F1 Score79.1
5
Scientific ReasoningR-Bench
pass@1 Score61.68
2
Showing 13 of 13 rows