Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VitaBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Interactive Tool-Use Agent PerformanceVitaBench
Delivery Score65
44
Agentic OversightVitaBench
Detection Accuracy82.13
42
Multi-turn tool-use interactionVitaBench
Delivery Score59
20
Agent PerformanceVitaBench OTA
Avg@49.75
10
Agent PerformanceVitaBench In-Store
Avg@432.25
10
Agent PerformanceVitaBench Delivery
Avg@429
10
Agentic taskVitaBench Delivery
Avg@230.74
8
Agentic taskVitaBench In-store
Avg@234.62
8
Showing 8 of 8 rows