Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Task Completion Classification on SARM (real-world rollouts)

92.8Average Accuracy

GRM-8B

31.54447.44763.3579.253Dec 29, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
92.8---
2025.12
83.9---
2025.12
83.9---
2025.12
81.1---
2025.12
76.7---
2025.12
61.7---
2025.12
37.2---
2025.12
33.9---