Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent evaluation on VSM full (evaluation)
Loading...
85.96
Average Rating
Claude-Opus 4.6
56.1952
63.9226
71.65
79.3774
Apr 14, 2026
Average Rating
Standard Deviation (SD)
Sample Token AVG
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Rating
Standard Deviation (SD)
Sample Token AVG
Claude-Opus 4.6
Model=Claude-Opus 4.6
2026.04
85.96
0.86
38,594
GLM-5
Model=GLM-5
2026.04
83.88
3.92
27,580
Qwen3-30B-A3B-Thinking-2507
Model=Qwen3-30B-A3B-Th...
2026.04
72.66
2.1
43,482
Ministral-3-14B-Reasoning-2512
Model=Ministral-3-14B-...
2026.04
57.34
6.64
39,819
Feedback
Search any
task
Search any
task