Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
CP on CP
Loading...
359
Generation Performance Score
Greedy
2.28
94.89
187.5
280.11
May 28, 2026
Generation Performance Score
FLOPs (10^15)
Updated 5d ago
Evaluation Results
Method
Method
Links
Generation Performance Score
FLOPs (10^15)
Greedy
threshold (tau)=0.99,...
2026.05
359
411.93
Thompson
threshold (tau)=0.999,...
2026.05
327
395.47
EXP3.P
threshold (tau)=0.99,...
2026.05
207
256.12
Greedy
threshold (tau)=0.95,...
2026.05
152
182.12
EXP3.P
threshold (tau)=0.95,...
2026.05
139
173.57
Rand
threshold (tau)=0.99,...
2026.05
138
170.88
UCB
threshold (tau)=0.99,...
2026.05
85
100.79
Thompson
threshold (tau)=0.99,...
2026.05
85
101.22
ShinkaEvolve
threshold (tau)=0.99,...
2026.05
73
90.15
Rand
threshold (tau)=0.95,...
2026.05
56
66.58
ShinkaEvolve
threshold (tau)=0.95,...
2026.05
45
57.9
UCB
threshold (tau)=0.95,...
2026.05
16
18.19
Thompson
threshold (tau)=0.95,...
2026.05
16
18.19
Feedback
Search any
task
Search any
task