Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task-oriented Interaction on To Do List Multi-turn
Loading...
100
Pass Rate
Semantic XPath
48
61.5
75
88.5
Mar 1, 2026
Pass Rate
Token Usage
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass Rate
Token Usage
Semantic XPath
Backbone=GPT-5 mini, S...
2026.03
100
6,132
Semantic XPath
Backbone=GPT-5 mini, S...
2026.03
100
5,649
Semantic XPath
Backbone=Gemini 3 Flas...
2026.03
100
5,872
Semantic XPath
Backbone=Gemini 3 Flas...
2026.03
100
5,526
In-context
Backbone=GPT-5 mini
2026.03
70
20,398
In-context
Backbone=Gemini 3 Flash
2026.03
70
23,602
Flat RAG
Backbone=GPT-5 mini
2026.03
50
3,043
Flat RAG
Backbone=Gemini 3 Flash
2026.03
50
2,959
Feedback
Search any
task
Search any
task