Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Optimization on HumanEval-Hard Cross-Domain
Loading...
13.6
Calls per Task
TextBFGS
12.712
18.706
24.7
30.694
Jan 20, 2026
Calls per Task
Tokens per Call
Tokens per Task
Updated 3mo ago
Evaluation Results
Method
Method
Links
Calls per Task
Tokens per Call
Tokens per Task
TextBFGS
Knowledge Base=MBPP KB...
2026.01
13.6
1,581.7
21.6
TextBFGS-REMO
Knowledge Base=MBPP KB...
2026.01
13.9
1,594.4
22.2
TextBFGS
Knowledge Base=None
2026.01
17
1,481.3
25.2
TextGrad-Momentum
2026.01
29.8
1,464.2
43.7
TextGrad
2026.01
35.8
863.9
30.9
Feedback
Search any
task
Search any
task