Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task Progression Failure Detection on Close Box Out-of-Distribution
Loading...
100
TPR
GPT-4o Image QA
25.12
44.56
64
83.44
Mar 12, 2026
TPR
TNR
Detection Time (s)
Updated 1mo ago
Evaluation Results
Method
Method
Links
TPR
TNR
Detection Time (s)
GPT-4o Image QA
Failure Detector Categ...
2026.03
100
-
22.68
Gemini 1.5 Pro Video QA
Failure Detector Categ...
2026.03
98
-
15.47
Gemini 1.5 Pro Image QA
Failure Detector Categ...
2026.03
96
-
23.2
Sentinel
Components=STAC MMD* +...
2026.03
96
-
12.2
GPT-4o Video QA*
Failure Detector Categ...
2026.03
87
-
22
Claude 3.5 Sonnet Video QA
Failure Detector Categ...
2026.03
80
-
23.2
Claude 3.5 Sonnet Image QA
Failure Detector Categ...
2026.03
78
-
23.2
Temporal Non-Distr. Min.
Failure Detector Categ...
2026.03
67
-
7.46
STAC For. KL
Failure Detector Categ...
2026.03
61
-
8.14
STAC Rev. KL
Failure Detector Categ...
2026.03
61
-
10.11
STAC MMD*
Failure Detector Categ...
2026.03
61
-
9.06
Diffusion Output Variance
Failure Detector Categ...
2026.03
28
-
11.57
Feedback
Search any
task
Search any
task