| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Multi-Agent Evaluation Set | Query+ | R@5100 | 6 | 4d ago | |
| Code Exec 5 variables (test) | Accuracy93.2 | 6 | 4d ago | ||
| Code Exec 3 variables (test) | Accuracy99 | 6 | 4d ago | ||
| AutoHealth Medical Benchmark Suite Tasks T1-T17 | T1 Execution Result1 | 5 | 4d ago | ||
| Health Benchmark | T11 | 5 | 4d ago | ||
| CodeNetMut (test) | CodeExecutor | Output Accuracy48.06 | 4 | 4d ago | |
| Tutorial (test) | CEL-S2 | Output Accuracy79.51 | 4 | 4d ago | |
| Qwen-Agent Code Interpreter Average | MatPlotAgent | Accuracy70.5 | 3 | 4d ago | |
| Qwen-Agent Code Interpreter Visualization-Easy | MatPlotAgent | Accuracy68.4 | 3 | 4d ago | |
| Qwen-Agent Code Interpreter Visualization-Hard | MatPlotAgent | Accuracy72.6 | 3 | 4d ago | |
| SingleLine (test) | CodeExecutor | Trace Accuracy94.03 | 3 | 4d ago |