Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CounterBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningCounterBench (test)
Accuracy88.9
55
ReasoningCounterBench
Error Rate0.0359
11
Showing 2 of 2 rows