Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Docstring-completion on Docstring-completion evaluation episodes (held-out)
Loading...
2.195
Policy Score
MechRL
2.08525
2.140125
2.195
2.249875
May 25, 2026
Policy Score
Oracle Score
Gap
Updated 7d ago
Evaluation Results
Method
Method
Links
Policy Score
Oracle Score
Gap
MechRL
K=1, In-episode picks=...
2026.05
2.195
-
0.004
Oracle
Selection=per-episode,...
2026.05
-
2.198
-
Feedback
Search any
task
Search any
task