Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Spatio-temporal Reasoning on Spatio-temporal Reasoning Dataset 16 frames
Loading...
82.7
Frame F1 (F1f)
Q-SFT+RL
39.956
51.053
62.15
73.247
Apr 8, 2026
Frame F1 (F1f)
Exact Match (EM)
Segment F1 (F1s)
Updated 9d ago
Evaluation Results
Method
Method
Links
Frame F1 (F1f)
Exact Match (EM)
Segment F1 (F1s)
Q-SFT+RL
Configuration=C2, Trai...
2026.04
82.7
44
62.9
GPT-4.1
Version=05/01/2025
2026.04
75.6
23.1
44.7
Q-SFT
Configuration=C1, Trai...
2026.04
65.7
32.9
45.4
Q-Baseline
Backbone=Qwen2.5-Coder...
2026.04
41.6
18.7
21.5
Feedback
Search any
task
Search any
task