Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Generalization on OOD Instructions Novel Prompts
Loading...
95
T3 Success Count [C]
DeLock
27.4
44.95
62.5
80.05
Apr 25, 2026
T3 Success Count [C]
T2 Success Count [C]
T4 Success Count [C]
T5 Success Count [S]
T6 Success Count [S]
T7 Success Count [S]
T8 Success Count [C+S]
Updated 1mo ago
Evaluation Results
Method
Method
Links
T3 Success Count [C]
T2 Success Count [C]
T4 Success Count [C]
T5 Success Count [S]
T6 Success Count [S]
T7 Success Count [S]
T8 Success Count [C+S]
DeLock
Contrastive Prompt Gui...
2026.04
95
95
85
55
65
70
65
π0.5-DROID
Description=generalist...
2026.04
90
-
90
-
-
55
0
DeLock w/o CPG
Contrastive Prompt Gui...
2026.04
90
85
75
0
0
0
0
DeLock w/ Frozen-Vis
Visual Encoder=Fixed d...
2026.04
70
80
65
10
55
40
20
DeLock w/o Vis-Reg
Visual Encoder Regular...
2026.04
35
45
10
0
0
0
0
RETAIN
Description=weight-spa...
2026.04
30
0
15
0
0
10
5
Feedback
Search any
task
Search any
task