Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Failure Attribution on Magentic
Loading...
81.2
Agent Accuracy
Our Baseline
3.2
23.45
43.7
63.95
Feb 2, 2026
Agent Accuracy
Step Accuracy
Updated 3mo ago
Evaluation Results
Method
Method
Links
Agent Accuracy
Step Accuracy
Our Baseline
Backbone LLM=GPT-5
2026.02
81.2
56.3
Who&When*
Backbone LLM=GPT-5, Pr...
2026.02
6.2
56.3
Feedback
Search any
task
Search any
task