Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Model Selection Ranking on 100-problem mini activation set (held-out)
Loading...
0.869
Pearson r
NEX
0.59028
0.66264
0.735
0.80736
Feb 5, 2026
Pearson r
Regret@1
Hit@3
Updated 4d ago
Evaluation Results
Method
Method
Links
Pearson r
Regret@1
Hit@3
NEX
Series=Qwen3-VL-8B, #B...
2026.02
0.869
204
-
NEX
Series=Qwen3-4B, #Bench=5
2026.02
0.859
90
-
NEX
Series=Qwen3-VL-4B, #B...
2026.02
0.782
570
-
NEX
Series=Overall, #Bench=20
2026.02
0.778
267
-
NEX
Series=Qwen3-VL-32B, #...
2026.02
0.601
205
-
Feedback
Search any
task
Search any
task