Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SEAL

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning over conflicting evidenceSEAL-0
Accuracy45.9
14
Deep SearchSEAL 0
Score41.44
11
Agent Capability EvaluationSEAL 0
Score53.4
9
Agent Tool-use and ReasoningSEAL (test)
Pass@351.97
8
Showing 4 of 4 rows