Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Reasoning on 2WikiMultihopQA (Acc, Len, Faith)
Loading...
82.1
Accuracy
GeoFaith
32.596
45.448
58.3
71.152
May 26, 2026
Accuracy
Answer Length (k)
Faithfulness
Updated 7d ago
Evaluation Results
Method
Method
Links
Accuracy
Answer Length (k)
Faithfulness
GeoFaith
Backbone=Qwen3-4B
2026.05
82.1
0.3
92
GRPO
Backbone=Qwen3-4B
2026.05
81.5
0.4
84.8
KnowRL
Backbone=Qwen3-4B
2026.05
79.8
0.4
86.5
Original
Backbone=Qwen3-4B
2026.05
79.2
0.6
90.1
TruthRL
Backbone=Qwen3-4B
2026.05
78.1
0.2
79.5
THS
Backbone=Qwen3-4B
2026.05
76.6
0.3
86.8
GeoFaith
Backbone=Qwen3-1.7B
2026.05
71.2
0.3
77.5
GRPO
Backbone=Qwen3-1.7B
2026.05
67.4
0.6
71.9
THS
Backbone=Qwen3-1.7B
2026.05
56.5
0.2
49.3
TruthRL
Backbone=Qwen3-1.7B
2026.05
53.6
0.2
53.9
KnowRL
Backbone=Qwen3-1.7B
2026.05
50.5
0.2
60.3
Original
Backbone=Qwen3-1.7B
2026.05
34.5
0.3
52
Feedback
Search any
task
Search any
task