Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Optimization on MBPP-Hard In-Domain
Loading...
7.1
Average Calls per Task
TextBFGS-REMO
5.92
13.885
21.85
29.815
Jan 20, 2026
Average Calls per Task
Average Tokens per Call
Total Tokens per Task
Updated 3mo ago
Evaluation Results
Method
Method
Links
Average Calls per Task
Average Tokens per Call
Total Tokens per Task
TextBFGS-REMO
Framework=One-Pass
2026.01
7.1
2,262
16,100
TextBFGS
Knowledge Retrieval=In...
2026.01
8.2
2,103.9
17,200
TextBFGS
Knowledge Retrieval=w/...
2026.01
18.2
1,330.2
24,200
TextGrad-Momentum
Strategy=Stateless Mom...
2026.01
34
1,114.1
37,900
TextGrad
2026.01
36.6
727.3
26,600
Feedback
Search any
task
Search any
task