Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Video Reasoning on SEED-Bench-R1 (L1 In-Dist.)
Loading...
50.5
Accuracy
APPO
27.308
33.329
39.35
45.371
Feb 27, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
APPO
Backbone=Qwen2.5-VL-7B...
2026.02
50.5
DAPO
Backbone=Qwen2.5-VL-7B...
2026.02
50
GRPO
Backbone=Qwen2.5-VL-7B...
2026.02
49
SFT
Backbone=Qwen2.5-VL-7B...
2026.02
40.2
APPO
Backbone=Qwen2.5-VL-3B...
2026.02
37.5
DAPO
Backbone=Qwen2.5-VL-3B...
2026.02
36.6
APPO
Size=7B, Training Data...
2026.02
35.4
GRPO
Backbone=Qwen2.5-VL-3B...
2026.02
35.3
VideoChat-R1
Size=7B, Training Data...
2026.02
33.3
SFT
Backbone=Qwen2.5-VL-3B...
2026.02
32.6
VideoRFT
Size=7B, Training Data...
2026.02
32.4
Video-R1
Size=7B, Training Data...
2026.02
30.9
TW-GRPO
Size=7B, Training Data...
2026.02
30.2
GRPO-CARE
Size=7B, Training Data...
2026.02
29.9
Base Model
Backbone=Qwen2.5-VL-7B...
2026.02
29.1
Base Model
Backbone=Qwen2.5-VL-3B...
2026.02
28.2
Feedback
Search any
task
Search any
task