Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
User-aligned task completion on KitchenAmbig (OOD)
Loading...
87.3
Accuracy
Adaptive CLIPR
80.228
82.064
83.9
85.736
May 12, 2026
Accuracy
Updated 20d ago
Evaluation Results
Method
Method
Links
Accuracy
Adaptive CLIPR
LLM engine=Claude Sonn...
2026.05
87.3
GATE
LLM engine=Claude Sonn...
2026.05
85.3
CIPHER
LLM engine=Claude Sonn...
2026.05
80.5
Feedback
Search any
task
Search any
task