Share your thoughts, 1 month free Claude Pro on usSee more

Scenario Understanding on Scenario Understanding (ID)

90.7Accuracy

TIMEOMNI-1

Updated 5mo ago

Evaluation Results

Method	Links
TIMEOMNI-1 2025.09		90.7	97.5
GPT-4.1-2025-04-14 2025.09		85.5	100
GPT-4.1-Nano 2025.09		66.2	97.5
Mistral-Small-3.1-24B-Ins 2025.09		64.8	100
Llama-3.1-70B-Instruct 2025.09		56.4	100
Qwen2.5-Instruct-7B 2025.09		48.5	100
Mistral-7B-v0.3 2025.09		40.5	92.2
Llama-3.1-8B-Instruct 2025.09		36.6	46.5
Time-MQA 2025.09		32.2	29.5
Time-R1 2025.09		30.9	94
Time-MQA 2025.09		25	14
Time-MQA 2025.09		15.1	21.5