Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Feasibility Prediction on Search R1
Loading...
40.2
F1@1
GPT-5.2 Instant
23.872
28.111
32.35
36.589
May 29, 2026
F1@1
F1@All
Fail F1
Updated 1d ago
Evaluation Results
Method
Method
Links
F1@1
F1@All
Fail F1
GPT-5.2 Instant
2026.05
40.2
38.3
0
Claude Opus 4.7
2026.05
39.4
40.5
5.6
Claude Sonnet 4.6
2026.05
37.9
33.3
0
Qwen3 235B
2026.05
33.2
23.9
30.9
Gemini 3.1 Pro Preview
2026.05
24.5
24.8
0
Feedback
Search any
task
Search any
task