Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Sensitivity to Logical Boundaries on QuestBench
Loading...
0.4391
Logic-Q
ALIVE-Self
0.139788
0.217494
0.2952
0.372906
Feb 5, 2026
Logic-Q
Planning-Q
Updated 4d ago
Evaluation Results
Method
Method
Links
Logic-Q
Planning-Q
ALIVE-Self
Backbone=Qwen3-30B-A3B...
2026.02
0.4391
0.3135
Qwen3-30B-A3B-Instr
2026.02
0.4018
0.085
GPT-4o
2026.02
0.3278
0.1451
DeepSeek-V3.2
2026.02
0.2713
0.2365
Kimi-K2
2026.02
0.1513
0.2103
Feedback
Search any
task
Search any
task