Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Embodied Task on Embodied
Loading...
100
Accuracy
Llama-3.2-1B-Instruct
85.752
89.451
93.15
96.849
May 20, 2026
Accuracy
Updated 13d ago
Evaluation Results
Method
Method
Links
Accuracy
Llama-3.2-1B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-1B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-3B-Instruct
Input Type=Trigger, Ex...
2026.05
100
Qwen2.5-1.5B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-1.5B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-3B-Instruct
Input Type=Trigger, Ex...
2026.05
100
Llama-3.2-1B-Instruct
Input Type=Trigger, Ex...
2026.05
98.8
Qwen2.5-1.5B-Instruct
Input Type=Trigger, Ex...
2026.05
96.3
Llama-3.2-1B-Instruct
Input Type=Trigger, Ex...
2026.05
93.8
Llama-3.2-3B-Instruct
Input Type=Trigger, Ex...
2026.05
92.5
Qwen2.5-3B-Instruct
Input Type=Trigger, Ex...
2026.05
91.2
Qwen2.5-1.5B-Instruct
Input Type=Trigger, Ex...
2026.05
86.3
Feedback
Search any
task
Search any
task