Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scenario-based Reasoning (Overall) on TSRBench
Loading...
86.55
Overall Accuracy
VeriTime
39.1676
51.4688
63.77
76.0712
Feb 8, 2026
Overall Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Overall Accuracy
VeriTime
Base Model=Qwen3-4B-In...
2026.02
86.55
VeriTime
Base Model=Qwen2.5-3B-...
2026.02
82.86
ChatTS
Base Model=Qwen3-4B-In...
2026.02
82.21
ChatTS
Base Model=Qwen2.5-3B-...
2026.02
78.31
Qwen3-4B-Instruct
Training Protocol=Base
2026.02
75.48
GPT-4o-mini
Model Type=General LLM
2026.02
70.43
Qwen2.5-7B-instruct
Model Type=General LLM
2026.02
66.81
Meta-Llama3-8B-Instruct
Model Type=General LLM
2026.02
59.22
Mistral-7B-v0.3
Model Type=General LLM
2026.02
59.22
Time-MQA
Base Model=Mistral-7B
2026.02
53.66
Time-MQA
Base Model=Llama3-8B
2026.02
53.05
DeepSeek-R1-Distill-Qwen-7B
Model Type=General LLM
2026.02
52.93
Time-R1
Base Model=Qwen2.5-7B
2026.02
51.73
Time-MQA
Base Model=Qwen2.5-7B
2026.02
41
Qwen2.5-3B-Instruct
Training Protocol=Base
2026.02
40.99
Feedback
Search any
task
Search any
task