Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Failure Detection on WidowX seen
Loading...
0.096
Brier Score
SAFE-RNN-TDQC (Ours)
0.07696
0.20548
0.334
0.46252
Apr 22, 2026
Brier Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Brier Score
SAFE-RNN-TDQC (Ours)
VLA Model=OpenVLA
2026.04
0.096
SAFE-MLP BCE
VLA Model=OpenVLA
2026.04
0.127
SAFE-MLP-TDQC (Ours)
VLA Model=OpenVLA
2026.04
0.13
RNN-TDQC (Ours)
VLA Model=OpenVLA
2026.04
0.156
SAFE-RNN
VLA Model=OpenVLA
2026.04
0.169
Running Avg prob.
VLA Model=OpenVLA
2026.04
0.255
Avg prob.
VLA Model=OpenVLA
2026.04
0.275
RNN-BCE
VLA Model=OpenVLA
2026.04
0.301
Avg entropy
VLA Model=OpenVLA
2026.04
0.414
Running Avg entropy
VLA Model=OpenVLA
2026.04
0.435
Max prob.
VLA Model=OpenVLA
2026.04
0.572
Feedback
Search any
task
Search any
task