Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Spatio-temporal Reasoning on Spatio-temporal Reasoning Dataset Overall
Loading...
87.5
Frame F1 (F1f)
Q-SFT+RL
46.94
57.47
68
78.53
Apr 8, 2026
Frame F1 (F1f)
Exact Match (EM)
Segment F1 (F1s)
Updated 9d ago
Evaluation Results
Method
Method
Links
Frame F1 (F1f)
Exact Match (EM)
Segment F1 (F1s)
Q-SFT+RL
Configuration=C2, Trai...
2026.04
87.5
64.5
72.3
GPT-4.1
Version=05/01/2025
2026.04
84.8
35
61
Q-SFT
Configuration=C1, Trai...
2026.04
80.4
56.6
63.6
Q-Baseline
Backbone=Qwen2.5-Coder...
2026.04
48.5
25
32.6
Feedback
Search any
task
Search any
task