Agent Capability Evaluation

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
SEAL 0	MiroThinker-H1	Average Score (@8)61.3		19	3mo ago
ACEBench Agent		Multi-Step Reasoning Score95		13	4mo ago

Showing 2 of 2 rows