Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Use on Tool-use multi-turn (test)
Loading...
76.8
Accuracy
Diverge (ExIt ablation)
59.848
64.249
68.65
73.051
Sep 4, 2025
Accuracy
Net Improvement (Delta_4)
Updated 3mo ago
Evaluation Results
Method
Method
Links
Accuracy
Net Improvement (Delta_4)
Diverge (ExIt ablation)
Backbone=Qwen2.5-7B-In...
2025.09
76.8
0.6
Full ExIt
Backbone=Qwen2.5-7B-In...
2025.09
76.4
1.2
Improve (ExIt ablation)
Backbone=Qwen2.5-7B-In...
2025.09
75.5
3.4
GRPO + curriculum
Backbone=Qwen2.5-7B-In...
2025.09
75.4
1
GRPO
Backbone=Qwen2.5-7B-In...
2025.09
73.5
-0.5
Base model
Backbone=Qwen2.5-7B-In...
2025.09
60.5
0.1
Feedback
Search any
task
Search any
task