Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Failure Detection on WidowX (unseen)
Loading...
0.153
Brier Score
SAFE-RNN-TDQC (Ours)
0.13596
0.25098
0.366
0.48102
Apr 22, 2026
Brier Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Brier Score
SAFE-RNN-TDQC (Ours)
VLA Model=OpenVLA
2026.04
0.153
SAFE-MLP BCE
VLA Model=OpenVLA
2026.04
0.164
SAFE-MLP-TDQC (Ours)
VLA Model=OpenVLA
2026.04
0.169
RNN-TDQC (Ours)
VLA Model=OpenVLA
2026.04
0.192
SAFE-RNN
VLA Model=OpenVLA
2026.04
0.213
Running Avg prob.
VLA Model=OpenVLA
2026.04
0.257
Avg prob.
VLA Model=OpenVLA
2026.04
0.282
RNN-BCE
VLA Model=OpenVLA
2026.04
0.344
Avg entropy
VLA Model=OpenVLA
2026.04
0.426
Running Avg entropy
VLA Model=OpenVLA
2026.04
0.432
Max prob.
VLA Model=OpenVLA
2026.04
0.579
Feedback
Search any
task
Search any
task