Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task-oriented Plan Retrieval on Meal Kit Recommendation Single-turn
Loading...
80
Pass Rate
Semantic XPath
22.8
37.65
52.5
67.35
Mar 1, 2026
Pass Rate
Token Count
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass Rate
Token Count
Semantic XPath
Model=GPT-5 mini, Scor...
2026.03
80
6,728
In-context
Model=GPT-5 mini, Scor...
2026.03
75
61,069
In-context
Model=Gemini 3 Flash,...
2026.03
70
52,849
Semantic XPath
Model=Gemini 3 Flash,...
2026.03
65
6,644
Semantic XPath
Model=GPT-5 mini, Scor...
2026.03
60
7,013
Semantic XPath
Model=Gemini 3 Flash,...
2026.03
60
6,966
Flat RAG
Model=Gemini 3 Flash,...
2026.03
30
3,800
Flat RAG
Model=GPT-5 mini, Scor...
2026.03
25
3,892
Feedback
Search any
task
Search any
task