Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Entity Reasoning on Spatio-Temporal Synthetic Dataset 1.0 (test)
Loading...
75.71
Accuracy
STReasoner-8B (Ours)
2.4628
21.4789
40.495
59.5111
Jan 6, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
STReasoner-8B (Ours)
Category=Spatio-Tempor...
2026.01
75.71
Qwen3-VL-8B-Instruct - SFT+S-GRPO
Category=Open-Source M...
2026.01
69.43
Qwen3-8B - SFT+S-GRPO
Category=Open-Source M...
2026.01
65.41
Qwen3-8B - SFT
Category=Open-Source M...
2026.01
62.65
Qwen3-VL-8B-Instruct - SFT
Category=Open-Source M...
2026.01
61.31
Claude-4.5-Sonnet
Category=Proprietary M...
2026.01
41.93
Claude-4.5-Sonnet
Category=Proprietary M...
2026.01
41.85
GPT-5.2
Category=Proprietary M...
2026.01
40.54
GPT-5.2
Category=Proprietary M...
2026.01
38.78
Qwen3-VL-8B-Instruct
Category=Open-Source M...
2026.01
31.16
Time-R1-7B
Category=Time Series R...
2026.01
29.65
ChatTS-8B
Category=Time Series L...
2026.01
19.51
Time-MQA-7B
Category=Time Series L...
2026.01
14.24
Qwen3-8B
Category=Open-Source M...
2026.01
5.28
Feedback
Search any
task
Search any
task