Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Code Execution on Multi-Agent Evaluation Set

100R@5

Query+

9597.5100102.5Jan 11, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
1000.76-
2026.01
1000.78-
2026.01
1000.85-
2026.01
1000.75-
2026.01
1000.78-
2026.01
1000.83-