Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task-oriented Interaction on Meal Kit Recommendation Multi-turn
Loading...
80
Pass Rate
In-context
7.2
26.1
45
63.9
Mar 1, 2026
Pass Rate
Token Count
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass Rate
Token Count
In-context
Backbone=GPT-5 mini
2026.03
80
158,552
Semantic XPath
Backbone=Gemini 3 Flas...
2026.03
80
6,782
Semantic XPath
Backbone=GPT-5 mini, S...
2026.03
70
6,079
Semantic XPath
Backbone=Gemini 3 Flas...
2026.03
70
6,954
Semantic XPath
Backbone=GPT-5 mini, S...
2026.03
60
7,087
In-context
Backbone=Gemini 3 Flash
2026.03
40
174,567
Flat RAG
Backbone=GPT-5 mini
2026.03
10
3,686
Flat RAG
Backbone=Gemini 3 Flash
2026.03
10
3,596
Feedback
Search any
task
Search any
task