Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Morphology Reasoning on Time-MMD Traffic (test)
Loading...
43.88
Accuracy
KAIROSAGENT-4B (+ Turn-Level Reward RL)
13.1064
21.0957
29.085
37.0743
May 28, 2026
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
KAIROSAGENT-4B (+ Turn-Level Reward RL)
Horizon=96, Category=C...
2026.05
43.88
Llama-3.1-8B-Instruct
Horizon=96, Category=C...
2026.05
43.62
KAIROSAGENT-4B (SFT-Only)
Horizon=96, Category=C...
2026.05
40.56
KAIROSAGENT-4B (+ Outcome-Level Reward RL)
Horizon=96, Category=C...
2026.05
38.27
DeepSeek-R1
Horizon=96, Category=A...
2026.05
37.76
GPT-5.2
Horizon=96, Category=A...
2026.05
34.95
DeepSeek-R1-Distill-Qwen-7B
Horizon=96, Category=C...
2026.05
14.29
Feedback
Search any
task
Search any
task