Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task-oriented Plan Retrieval on To Do List Single-turn
Loading...
84
Pass Rate
Semantic XPath
38.24
50.12
62
73.88
Mar 1, 2026
Pass Rate
Token Usage
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass Rate
Token Usage
Semantic XPath
Model=GPT-5 mini, Scor...
2026.03
84
5,184
In-context
Model=GPT-5 mini, Scor...
2026.03
80
5,444
In-context
Model=Gemini 3 Flash,...
2026.03
72
10,465
Semantic XPath
Model=Gemini 3 Flash,...
2026.03
72
5,605
Semantic XPath
Model=Gemini 3 Flash,...
2026.03
68
5,710
Semantic XPath
Model=GPT-5 mini, Scor...
2026.03
64
5,937
Flat RAG
Model=Gemini 3 Flash,...
2026.03
48
2,934
Flat RAG
Model=GPT-5 mini, Scor...
2026.03
40
2,722
Feedback
Search any
task
Search any
task