Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Spatio-temporal Reasoning on Spatio-temporal Reasoning Dataset (12 frames)
Loading...
85.2
Frame F1 (F1f)
Q-SFT+RL
43.6
54.4
65.2
76
Apr 8, 2026
Frame F1 (F1f)
Exact Match (EM)
Segment F1 (F1s)
Updated 9d ago
Evaluation Results
Method
Method
Links
Frame F1 (F1f)
Exact Match (EM)
Segment F1 (F1s)
Q-SFT+RL
Configuration=C2, Trai...
2026.04
85.2
54.9
65.9
GPT-4.1
Version=05/01/2025
2026.04
79.7
23.6
49.7
Q-SFT
Configuration=C1, Trai...
2026.04
77.1
45.1
56.5
Q-Baseline
Backbone=Qwen2.5-Coder...
2026.04
45.2
18.7
23.1
Feedback
Search any
task
Search any
task