Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Task Progression Failure Detection on Cover Object Policy Success Rate: 3% (Out-of-Distribution)

100TPR

GPT-4o Image QA

-4235077Mar 12, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
100011.03
2026.03
100010.06
2026.03
10007.54
2026.03
9410011.59
2026.03
91011.15
881008.69
2026.03
8510012
821008.68
2026.03
7910012
2026.03
7610011.74
2026.03
7010012
2026.03
91007.73
61007.6
2026.03
31006.4
2026.03
31006.4
0100-