Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Terminal-related CLI agent task on CRUST-Bench
Loading...
48.05
Accuracy
TACO
46.958
47.2415
47.525
47.8085
Apr 21, 2026
Accuracy
Total Tokens (M)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Total Tokens (M)
TACO
Backbone=MiniMax-2.5
2026.04
48.05
134.97
Baseline
Backbone=MiniMax-2.5
2026.04
47
163.53
Feedback
Search any
task
Search any
task