Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Knowledge-Intensive Reasoning on HQA
Loading...
87
Average Score
AutoTraj
3.28
25.015
46.75
68.485
Jan 30, 2026
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
AutoTraj
Category=SFT-RL TIR Me...
2026.01
87
Qwen2.5-7B-Instruct
Framework=Multi-Dimens...
2026.01
85
AutoTIR
Category=RL-only TIR M...
2026.01
85
R1-Searcher
Category=RL-only TIR M...
2026.01
83
ReSearch
Category=RL-only TIR M...
2026.01
82
Tool-Star-SFT
Category=SFT-only TIR...
2026.01
77
Vanilla SFT-RL TIR
Category=SFT-RL TIR Me...
2026.01
75
Tool-Star
Category=SFT-RL TIR Me...
2026.01
74
ToRL
Category=RL-only TIR M...
2026.01
72
AutoTIR
Backbone=Qwen2.5-7B, T...
2026.01
30.5
ReSearch
Backbone=Qwen2.5-7B, T...
2026.01
28.5
AutoTraj
Backbone=Qwen2.5-7B, T...
2026.01
28.5
Vanilla SFT-RL TIR
Backbone=Qwen2.5-7B, T...
2026.01
28
Tool-Star
Backbone=Qwen2.5-7B, T...
2026.01
27
Tool-Star-SFT
Backbone=Qwen2.5-7B, T...
2026.01
26.5
Qwen2.5-7B-Instruct
Backbone=Qwen2.5-7B, T...
2026.01
24.5
R1-Searcher
Backbone=Qwen2.5-7B, T...
2026.01
19.5
ToRL
Backbone=Qwen2.5-7B, T...
2026.01
6.5
Feedback
Search any
task
Search any
task