Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Scenario Understanding on Scenario Understanding (ID)
Loading...
90.7
Accuracy
TIMEOMNI-1
12.076
32.488
52.9
73.312
Sep 29, 2025
Accuracy
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Success Rate
TIMEOMNI-1
Base LLM=Qwen2.5-Instr...
2025.09
90.7
97.5
GPT-4.1-2025-04-14
2025.09
85.5
100
GPT-4.1-Nano
2025.09
66.2
97.5
Mistral-Small-3.1-24B-Ins
2025.09
64.8
100
Llama-3.1-70B-Instruct
2025.09
56.4
100
Qwen2.5-Instruct-7B
2025.09
48.5
100
Mistral-7B-v0.3
2025.09
40.5
92.2
Llama-3.1-8B-Instruct
2025.09
36.6
46.5
Time-MQA
Base LLM=Llama3-8B
2025.09
32.2
29.5
Time-R1
Base LLM=Qwen2.5-Instr...
2025.09
30.9
94
Time-MQA
Base LLM=Qwen2.5-7B
2025.09
25
14
Time-MQA
Base LLM=Mistral-7B-v0.3
2025.09
15.1
21.5
Feedback
Search any
task
Search any
task