Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Introspective Planning on KitchenAmbig (OOD)
Loading...
97.6
Average Accuracy
Adaptive CLIPR
65.048
73.499
81.95
90.401
May 12, 2026
Average Accuracy
Standard Deviation
Updated 20d ago
Evaluation Results
Method
Method
Links
Average Accuracy
Standard Deviation
Adaptive CLIPR
elicitation turns=up t...
2026.05
97.6
3
GATE
elicitation turns=5
2026.05
96.6
4.1
CLIPR
2026.05
96.1
4.8
CIPHER (Lev.)
2026.05
93.7
2.8
TidyBot
2026.05
93.2
4
CIPHER (Sem.)
2026.05
92.7
3
ICL + Answers
protocol=In-Context Le...
2026.05
89.3
10.6
IP (full)
2026.05
86.3
9.8
Zero-shot
protocol=Zero-shot
2026.05
78
7.3
ICL
protocol=In-Context Le...
2026.05
66.3
14.3
Feedback
Search any
task
Search any
task