| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Agent Evaluation Dataset (20 agents x 2 requirement types) | Time (min)0.68 | 10 | 21d ago | ||
| Auto-ClawEval Mini (104 environments) | Safety Score94.2 | 8 | 1mo ago | ||
| Auto-ClawEval | Safety93.3 | 8 | 1mo ago | ||
| VSM full (evaluation) | Average Rating85.96 | 4 | 1mo ago |