Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Spatio-temporal Reasoning on Spatio-temporal Reasoning Dataset 8 frames
Loading...
85.6
Frame F1 (F1f)
Q-SFT+RL
44
54.8
65.6
76.4
Apr 8, 2026
Frame F1 (F1f)
Exact Match (EM)
Segment F1 (F1s)
Updated 9d ago
Evaluation Results
Method
Method
Links
Frame F1 (F1f)
Exact Match (EM)
Segment F1 (F1s)
Q-SFT+RL
Configuration=C2, Trai...
2026.04
85.6
61.1
68.6
GPT-4.1
Version=05/01/2025
2026.04
81.3
23.6
52.8
Q-SFT
Configuration=C1, Trai...
2026.04
78.6
50.7
58.2
Q-Baseline
Backbone=Qwen2.5-Coder...
2026.04
45.6
19.1
25.8
Feedback
Search any
task
Search any
task