Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Video Reasoning on SEED-Bench L3 OOD R1
Loading...
49.3
Accuracy
APPO
24.652
31.051
37.45
43.849
Feb 27, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
APPO
Backbone=Qwen2.5-VL-7B...
2026.02
49.3
DAPO
Backbone=Qwen2.5-VL-7B...
2026.02
48.7
GRPO
Backbone=Qwen2.5-VL-7B...
2026.02
48.2
SFT
Backbone=Qwen2.5-VL-7B...
2026.02
43.9
APPO
Backbone=Qwen2.5-VL-3B...
2026.02
35
APPO
Size=7B, Training Data...
2026.02
34.7
DAPO
Backbone=Qwen2.5-VL-3B...
2026.02
31.8
GRPO
Backbone=Qwen2.5-VL-3B...
2026.02
31
Base Model
Backbone=Qwen2.5-VL-7B...
2026.02
29.6
Video-R1
Size=7B, Training Data...
2026.02
28.8
SFT
Backbone=Qwen2.5-VL-3B...
2026.02
28.6
VideoChat-R1
Size=7B, Training Data...
2026.02
27.8
VideoRFT
Size=7B, Training Data...
2026.02
27.7
Base Model
Backbone=Qwen2.5-VL-3B...
2026.02
27
GRPO-CARE
Size=7B, Training Data...
2026.02
26
TW-GRPO
Size=7B, Training Data...
2026.02
25.6
Feedback
Search any
task
Search any
task