Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Planning and Tool Use on SLATE synthetic
Loading...
36.4
Tool Match Rate
EGB-Logits
22.672
26.236
29.8
33.364
Apr 13, 2026
Tool Match Rate
Execution Success Rate
Action Identification Acc
Updated 4d ago
Evaluation Results
Method
Method
Links
Tool Match Rate
Execution Success Rate
Action Identification Acc
EGB-Logits
Base Model=Qwen2.5-7B-...
2026.04
36.4
67.8
85.1
EGB-Sampling
Base Model=Qwen2.5-7B-...
2026.04
33.6
51.3
83
ReAct
Base Model=Qwen2.5-7B-...
2026.04
30
29.3
78.2
Reflexion
Base Model=Qwen2.5-7B-...
2026.04
25.2
44.2
82.9
Baseline-LLM
Base Model=Qwen2.5-7B-...
2026.04
23.2
-
-
Feedback
Search any
task
Search any
task