Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Workflow Reconstruction on CHATDEV (test)
Loading...
50.9
SFE
AgentXRay (Selected)
24.588
31.419
38.25
45.081
Feb 5, 2026
SFE
Updated 4d ago
Evaluation Results
Method
Method
Links
SFE
AgentXRay (Selected)
Config=Selected primit...
2026.02
50.9
AgentXRay (All Tools)
Config=Full primitive...
2026.02
42.5
AgentXRay w/o Tools
Ablation=Without Tools
2026.02
41.3
AFlow
Protocol=MCTS-based wo...
2026.02
40.3
SFT
Protocol=Direct input-...
2026.02
35.5
AgentXRay w/o Pruning
Ablation=Without Pruning
2026.02
28.6
ReAct (Claude Opus 4.5)
Protocol=ReAct-style t...
2026.02
26.7
Claude Opus 4.5
Protocol=Multi-turn se...
2026.02
25.6
Feedback
Search any
task
Search any
task