Share your thoughts, 1 month free Claude Pro on usSee more

Sensitivity to Logical Boundaries on QuestBench

0.4391Logic-Q

ALIVE-Self

Updated 4mo ago

Evaluation Results

Method	Links
ALIVE-Self 2026.02		0.4391	0.3135
Qwen3-30B-A3B-Instr 2026.02		0.4018	0.085
GPT-4o 2026.02		0.3278	0.1451
DeepSeek-V3.2 2026.02		0.2713	0.2365
Kimi-K2 2026.02		0.1513	0.2103