Share your thoughts, 1 month free Claude Pro on usSee more

10 tasks

Benchmarks

Task Name	Dataset Name	SOTA Result	Trend
Large Language Model Evaluation	10 tasks average	Avg Accuracy70.56		50

Showing 1 of 1 rows

Popular tasks

Large Language Model Evaluation

Follow for update

@wizwand_team Discord

© 2026 wizwand

Blog Contact Changelog Swarm

Privacy Policy Terms of Service FAQs Swarm Docs