Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Morphology Reasoning on Time-MMD Climate (test)
Loading...
99.18
Accuracy
DeepSeek-R1
40.6072
55.8136
71.02
86.2264
May 28, 2026
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
DeepSeek-R1
Horizon=96, Category=A...
2026.05
99.18
KAIROSAGENT-4B (+ Turn-Level Reward RL)
Horizon=96, Category=C...
2026.05
98.08
GPT-5.2
Horizon=96, Category=A...
2026.05
97.8
KAIROSAGENT-4B (SFT-Only)
Horizon=96, Category=C...
2026.05
97.8
KAIROSAGENT-4B (+ Outcome-Level Reward RL)
Horizon=96, Category=C...
2026.05
96.7
Llama-3.1-8B-Instruct
Horizon=96, Category=C...
2026.05
52.47
DeepSeek-R1-Distill-Qwen-7B
Horizon=96, Category=C...
2026.05
42.86
Feedback
Search any
task
Search any
task