Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Etiological Reasoning on Spatio-Temporal Synthetic Dataset 1.0 (test)
Loading...
95.65
Accuracy
STReasoner-8B (Ours)
16.2668
36.8759
57.485
78.0941
Jan 6, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
STReasoner-8B (Ours)
Category=Spatio-Tempor...
2026.01
95.65
Qwen3-VL-8B-Instruct - SFT+S-GRPO
Category=Open-Source M...
2026.01
91.79
Qwen3-VL-8B-Instruct - SFT
Category=Open-Source M...
2026.01
90.82
Qwen3-8B - SFT+S-GRPO
Category=Open-Source M...
2026.01
89.37
GPT-5.2
Category=Proprietary M...
2026.01
86.47
GPT-5.2
Category=Proprietary M...
2026.01
83.09
Qwen3-8B - SFT
Category=Open-Source M...
2026.01
82.13
Claude-4.5-Sonnet
Category=Proprietary M...
2026.01
79.61
Claude-4.5-Sonnet
Category=Proprietary M...
2026.01
78.64
Qwen3-VL-8B-Instruct
Category=Open-Source M...
2026.01
68.6
Time-R1-7B
Category=Time Series R...
2026.01
60.39
ChatTS-8B
Category=Time Series L...
2026.01
56.52
Qwen3-8B
Category=Open-Source M...
2026.01
21.26
Time-MQA-7B
Category=Time Series L...
2026.01
19.32
Feedback
Search any
task
Search any
task