Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Task Performance on Agent Task Benchmark 240 documents 1.0 (Evaluation set)
Loading...
92.3
Information Lookup Success Rate
OBJECTGRAPH(E)
87.204
88.527
89.85
91.173
Apr 30, 2026
Information Lookup Success Rate
Procedure Execution Success Rate
Multi-step Planning Success Rate
Role-Conditional Performance
Cross-node Reasoning Success Rate
Update Detection Accuracy
Assertion Verification Accuracy
Multi-agent Handoff Success Rate
Average Performance
Updated 1mo ago
Evaluation Results
Method
Method
Links
Information Lookup Success Rate
Procedure Execution Success Rate
Multi-step Planning Success Rate
Role-Conditional Performance
Cross-node Reasoning Success Rate
Update Detection Accuracy
Assertion Verification Accuracy
Multi-agent Handoff Success Rate
Average Performance
OBJECTGRAPH(E)
explicit edge declarat...
2026.04
92.3
90.1
86.2
95.1
80.3
91.6
96.5
94.1
90.8
OBJECTGRAPH
explicit edge declarat...
2026.04
92.1
89.4
85.7
94.8
77.9
91.4
96.3
93.2
90.1
MD
Representation Format=...
2026.04
91.2
88.6
84.3
76.4
82.1
61.3
52.8
71.4
76
RAG
Representation Format=...
2026.04
87.4
83.1
79.8
71.2
74.6
54.7
48.1
69.3
71
Feedback
Search any
task
Search any
task