Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Induction on Induction evaluation episodes (held-out)
Loading...
3.408
Policy Score
MechRL
3.2376
3.3228
3.408
3.4932
May 25, 2026
Policy Score
Oracle Score
Performance Gap
Updated 7d ago
Evaluation Results
Method
Method
Links
Policy Score
Oracle Score
Performance Gap
MechRL
K=1, In-episode picks=...
2026.05
3.408
-
0.006
Oracle
Selection=per-episode,...
2026.05
-
3.402
-
Feedback
Search any
task
Search any
task