Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop QA on 2WikiMultihopQA (pass@1 accuracy)
Loading...
81.5
Pass@1 Accuracy
SwiR
78.9
79.575
80.25
80.925
Oct 6, 2025
Pass@1 Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy
SwiR
Backbone=Qwen3-8B
2025.10
81.5
CoT (Greedy)
Backbone=Qwen3-8B, dec...
2025.10
79.5
CoT
Backbone=Qwen3-8B
2025.10
79
Soft Thinking
Backbone=Qwen3-8B
2025.10
79
Feedback
Search any
task
Search any
task