Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Sensitivity to Logical Boundaries on QuestBench

0.4391Logic-Q

ALIVE-Self

0.1397880.2174940.29520.372906Feb 5, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
0.43910.3135
0.40180.085
2026.02
0.32780.1451
0.27130.2365
2026.02
0.15130.2103