Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Explicit Attack on Big
Loading...
3
Avg Expected Queries
gem-2.5-flash
-0.88
25.31
51.5
77.69
Feb 13, 2026
Avg Expected Queries
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg Expected Queries
gem-2.5-flash
System Setup=Orchestra...
2026.02
3
gpt-4.1-mini
System Setup=Orchestra...
2026.02
4
gem-2.5-flash
System Setup=Pure SQL...
2026.02
4
gpt-4.1-mini
System Setup=Orchestra...
2026.02
6
gpt-4.1-mini
System Setup=Orchestra...
2026.02
6
gpt-4.1-mini
System Setup=Pure SQL...
2026.02
7
gpt-4.1
System Setup=Orchestra...
2026.02
8
gem-2.5-flash
System Setup=Orchestra...
2026.02
9
gpt-4.1
System Setup=Pure SQL...
2026.02
10
sonnet-4
System Setup=Orchestra...
2026.02
10
gem-2.5-flash
System Setup=Orchestra...
2026.02
11
sonnet-4
System Setup=Pure SQL...
2026.02
12
gpt-4.1
System Setup=Orchestra...
2026.02
18
gpt-4.1
System Setup=Orchestra...
2026.02
72
o4-mini
System Setup=Pure SQL...
2026.02
72
o4-mini
System Setup=Orchestra...
2026.02
100
Feedback
Search any
task
Search any
task