Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Spatiotemporal reasoning on ST-Bench 2026
Loading...
55.3
Exact Match (EM)
STAR
25.556
33.278
41
48.722
May 11, 2026
Exact Match (EM)
Updated 22d ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
STAR
Backbone=Qwen3-8B, Que...
2026.05
55.3
LLM-only (CoT)
Backbone=Qwen3-8B, Que...
2026.05
45
Graph-of-Thought
Backbone=Qwen3-8B, Que...
2026.05
42.5
Reflexion
Backbone=Qwen3-8B, Que...
2026.05
40
Tree-of-Thought
Backbone=Qwen3-8B, Que...
2026.05
40
LLM-only
Backbone=Qwen3-8B, Que...
2026.05
37.5
Function-calling
Backbone=Qwen3-8B, Que...
2026.05
33.3
ReAct
Backbone=Qwen3-8B, Que...
2026.05
26.7
Feedback
Search any
task
Search any
task