Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Spatio-temporal Reasoning on Spatio-temporal Reasoning Dataset 4 frames
Loading...
88.8
Frame F1 (F1f)
GPT-4.1
45.536
56.768
68
79.232
Apr 8, 2026
Frame F1 (F1f)
Exact Match (EM)
Segment F1 (F1s)
Updated 9d ago
Evaluation Results
Method
Method
Links
Frame F1 (F1f)
Exact Match (EM)
Segment F1 (F1s)
GPT-4.1
Version=05/01/2025
2026.04
88.8
26.9
64
Q-SFT+RL
Configuration=C2, Trai...
2026.04
88.1
67.6
69.4
Q-SFT
Configuration=C1, Trai...
2026.04
83.6
63.6
64.4
Q-Baseline
Backbone=Qwen2.5-Coder...
2026.04
47.2
21.1
32.5
Feedback
Search any
task
Search any
task