| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Execution | Qwen-Agent Code Interpreter Average | Accuracy70.5 | 3 | |
| Code Execution | Qwen-Agent Code Interpreter Visualization-Easy | Accuracy68.4 | 3 | |
| Code Execution | Qwen-Agent Code Interpreter Visualization-Hard | Accuracy72.6 | 3 |