Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Workflow Reconstruction on CHATDEV (test)
Loading...
50.9
SFE
AgentXRay (Selected)
24.588
31.419
38.25
45.081
Feb 5, 2026
SFE
Updated 1mo ago
Evaluation Results
Method
Method
Links
SFE
AgentXRay (Selected)
Config=Selected primit...
2026.02
50.9
AgentXRay (All Tools)
Config=Full primitive...
2026.02
42.5
AgentXRay w/o Tools
Ablation=Without Tools
2026.02
41.3
AFlow
Protocol=MCTS-based wo...
2026.02
40.3
SFT
Protocol=Direct input-...
2026.02
35.5
AgentXRay w/o Pruning
Ablation=Without Pruning
2026.02
28.6
ReAct (Claude Opus 4.5)
Protocol=ReAct-style t...
2026.02
26.7
Claude Opus 4.5
Protocol=Multi-turn se...
2026.02
25.6
Feedback
Search any
task
Search any
task