Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Robot Failure Analysis (MCQ) on RoboFAC (Real-world)
Loading...
96
FD
GPT-4o
0.32
25.16
50
74.84
Apr 8, 2026
FD
FI
FL
Updated 1mo ago
Evaluation Results
Method
Method
Links
FD
FI
FL
GPT-4o
Finetuned=false
2026.04
96
43
52
KITE+Qwen2.5-7B+QLoRA
Finetuned=true
2026.04
89
58
77
KITE + Qwen2.5-VL-7B
Finetuned=false
2026.04
84
43
74
Qwen2.5-VL-7B
Finetuned=false
2026.04
83
38
72
RoboFAC-7B
Finetuned=true
2026.04
80
56
71
Gemini-2.0
Finetuned=false
2026.04
60
11
18
Qwen2.5-VL-3B
Finetuned=false
2026.04
4
3
7
Feedback
Search any
task
Search any
task