Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Consistency Evaluation on Diagnostic (Avg. YouCook2, COIN, CrossTask) (test)
Loading...
76.92
State Accuracy
CAST
55.4856
61.0503
66.615
72.1797
Mar 9, 2026
State Accuracy
Identification Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
State Accuracy
Identification Accuracy
CAST
Backbone=VideoPrism-B,...
2026.03
76.92
77.66
CAST
Backbone=InternVideo2-...
2026.03
75.43
77.77
VideoPrism-B
Backbone=VideoPrism-B,...
2026.03
68.38
33.68
CAST
Backbone=GME-Qwen2-VL-...
2026.03
67.2
72.28
CAST
Backbone=Qwen3-VL-Embe...
2026.03
66.18
69.11
InternVideo2-1B
Backbone=InternVideo2-...
2026.03
65.7
30.85
Qwen3-VL-Embedding-2B
Backbone=Qwen3-VL-Embe...
2026.03
58.44
29.79
GME-Qwen2-VL-2B
Backbone=GME-Qwen2-VL-...
2026.03
56.31
29.73
Feedback
Search any
task
Search any
task