Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task-oriented Interaction on Travel Itinerary Multi-turn
Loading...
90
Pass Rate
In-context
17.2
36.1
55
73.9
Mar 1, 2026
Pass Rate
Token Count
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass Rate
Token Count
In-context
Backbone=GPT-5 mini
2026.03
90
28,342
Semantic XPath
Backbone=GPT-5 mini, S...
2026.03
80
5,820
In-context
Backbone=Gemini 3 Flash
2026.03
60
31,639
Semantic XPath
Backbone=Gemini 3 Flas...
2026.03
60
5,451
Semantic XPath
Backbone=Gemini 3 Flas...
2026.03
60
5,671
Semantic XPath
Backbone=GPT-5 mini, S...
2026.03
50
5,664
Flat RAG
Backbone=GPT-5 mini
2026.03
20
3,250
Flat RAG
Backbone=Gemini 3 Flash
2026.03
20
3,164
Feedback
Search any
task
Search any
task