Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Action-relation hallucination evaluation on R-Bench Instance

75.86Accuracy

RVE

58.554463.047267.5472.0328May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
75.8686
2026.05
75.7385.6
2026.05
74.4684.97
2026.05
74.4684.56
2026.05
73.2283.35
2026.05
72.8783.03
2026.05
70.6481.2
2026.05
70.4580.67
2026.05
70.0580.43
2026.05
69.7779.77
2026.05
69.1279.37
2026.05
69.0179.3
2026.05
68.9579.1
2026.05
68.6579.76
2026.05
68.5379.33
2026.05
68.578.8
2026.05
68.0579
2026.05
67.6977.61
2026.05
67.378.15
2026.05
66.777.57
2026.05
66.3776.54
2026.05
66.2576.4
2026.05
64.5983.15
2026.05
62.9473.45
2026.05
59.2269.83