Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task-oriented Plan Retrieval on Average across domains Single-turn
Loading...
83
Pass Rate
Semantic XPath
27.88
42.19
56.5
70.81
Mar 1, 2026
Pass Rate
Token Usage
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass Rate
Token Usage
Semantic XPath
Model=GPT-5 mini, Scor...
2026.03
83
5,691
In-context
Model=Gemini 3 Flash,...
2026.03
79
25,387
Semantic XPath
Model=Gemini 3 Flash,...
2026.03
70.7
5,830
In-context
Model=GPT-5 mini, Scor...
2026.03
70
26,216
Semantic XPath
Model=GPT-5 mini, Scor...
2026.03
66.3
6,166
Semantic XPath
Model=Gemini 3 Flash,...
2026.03
64.3
6,043
Flat RAG
Model=Gemini 3 Flash,...
2026.03
37.7
3,361
Flat RAG
Model=GPT-5 mini, Scor...
2026.03
30
3,250
Feedback
Search any
task
Search any
task