Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Video Reasoning on SEED-Bench L2 OOD R1
Loading...
51.6
Accuracy
APPO
27.368
33.659
39.95
46.241
Feb 27, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
APPO
Backbone=Qwen2.5-VL-7B...
2026.02
51.6
DAPO
Backbone=Qwen2.5-VL-7B...
2026.02
51.3
GRPO
Backbone=Qwen2.5-VL-7B...
2026.02
49.7
SFT
Backbone=Qwen2.5-VL-7B...
2026.02
42.8
APPO
Size=7B, Training Data...
2026.02
40
APPO
Backbone=Qwen2.5-VL-3B...
2026.02
39.1
DAPO
Backbone=Qwen2.5-VL-3B...
2026.02
37.5
GRPO
Backbone=Qwen2.5-VL-3B...
2026.02
35.7
VideoChat-R1
Size=7B, Training Data...
2026.02
34.4
SFT
Backbone=Qwen2.5-VL-3B...
2026.02
33.7
Base Model
Backbone=Qwen2.5-VL-7B...
2026.02
32.7
VideoRFT
Size=7B, Training Data...
2026.02
32.5
Video-R1
Size=7B, Training Data...
2026.02
32.3
Base Model
Backbone=Qwen2.5-VL-3B...
2026.02
29.4
TW-GRPO
Size=7B, Training Data...
2026.02
29
GRPO-CARE
Size=7B, Training Data...
2026.02
28.3
Feedback
Search any
task
Search any
task