Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference-aligned decision making on Housekeep (test)
Loading...
42.5
Accuracy
Adaptive CLIPR
26.276
30.488
34.7
38.912
May 12, 2026
Accuracy
Updated 20d ago
Evaluation Results
Method
Method
Links
Accuracy
Adaptive CLIPR
Feedback=Adaptive
2026.05
42.5
CIPHER (Sem.)
Distance Metric=Semantic
2026.05
42.1
CLIPR
Feedback=Iterative lea...
2026.05
41.6
CIPHER (Lev.)
Distance Metric=Levens...
2026.05
38
ICL + Answers
Mode=In-context learni...
2026.05
37.8
TidyBot
2026.05
35.8
GATE (15-turn)
Interaction turns=15
2026.05
31.4
IP
Method Identity=Intros...
2026.05
30.3
ICL
Mode=In-context learning
2026.05
26.9
Zero-shot
Protocol=Zero-shot
2026.05
26.9
Feedback
Search any
task
Search any
task