Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Rationale Faithfulness Evaluation on PVP
Loading...
99.5
R-D Consistency
Qwen2.5-VL-7B-Instruct (Reasoning-SFT)
89.62
92.185
94.75
97.315
May 9, 2026
R-D Consistency
R-I Groundedness
R-D Sensitivity
Updated 22d ago
Evaluation Results
Method
Method
Links
R-D Consistency
R-I Groundedness
R-D Sensitivity
Qwen2.5-VL-7B-Instruct (Reasoning-SFT)
Student Model=Qwen2.5-...
2026.05
99.5
84.7
74.6
Phi-3.5-Vision-Instruct (Base)
Student Model=Phi-3.5-...
2026.05
99
70.8
76.6
Phi-3.5-Vision-Instruct (Reasoning-SFT)
Student Model=Phi-3.5-...
2026.05
99
70.3
66.5
Qwen2.5-VL-7B-Instruct (Base)
Student Model=Qwen2.5-...
2026.05
96.2
72.7
88.5
Phi-3.5-Vision-Instruct (GRPO)
Student Model=Phi-3.5-...
2026.05
90.4
75.1
51.7
Qwen2.5-VL-7B-Instruct (GRPO)
Student Model=Qwen2.5-...
2026.05
90
78
73.7
Feedback
Search any
task
Search any
task