Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Temporal Reasoning on Date Understanding
Loading...
89.53
Accuracy
GRPO
36.8228
50.5064
64.19
77.8736
Apr 6, 2026
Accuracy
Updated 11d ago
Evaluation Results
Method
Method
Links
Accuracy
GRPO
Base Model=Qwen3-4B-In...
2026.04
89.53
Cog-DRIFT
Base Model=Qwen3-4B-In...
2026.04
88.75
Few-shot
Base Model=Qwen3-4B-In...
2026.04
88.37
Zero-shot
Base Model=Qwen3-4B-In...
2026.04
87.71
NuRL (Abstract)
Base Model=Qwen3-4B-In...
2026.04
87.48
RFT
Base Model=Qwen3-4B-In...
2026.04
85.11
NuRL (Prefix)
Base Model=Qwen3-4B-In...
2026.04
84.17
Cog-DRIFT
Base Model=Llama3.2-3B...
2026.04
62.82
NuRL (Abstract)
Base Model=Llama3.2-3B...
2026.04
56.72
Few-shot
Base Model=Llama3.2-3B...
2026.04
55.63
NuRL (Prefix)
Base Model=Llama3.2-3B...
2026.04
54.91
Zero-shot
Base Model=Llama3.2-3B...
2026.04
54.87
RFT
Base Model=Llama3.2-3B...
2026.04
50.31
GRPO
Base Model=Llama3.2-3B...
2026.04
38.85
Feedback
Search any
task
Search any
task