Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Feasibility Prediction on Warehouse
Loading...
42
F1@1
Gemini 3.1 Pro Preview
32.952
35.301
37.65
39.999
May 29, 2026
F1@1
F1@All
Fail F1
Updated 1d ago
Evaluation Results
Method
Method
Links
F1@1
F1@All
Fail F1
Gemini 3.1 Pro Preview
2026.05
42
67
62.8
Qwen3 235B
2026.05
41
60.8
56
GPT-5.2 Instant
2026.05
35
63.4
56.9
Claude Opus 4.7
2026.05
33.3
63.2
55.7
Claude Sonnet 4.6
2026.05
33.3
64.9
59
Feedback
Search any
task
Search any
task