Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Reasoning on Open-Platypus (test)
Loading...
78.06
Accuracy
Latent-GRPO
32.7056
44.4803
56.255
68.0297
Jan 13, 2026
Accuracy
Latency (ms)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Latency (ms)
Latent-GRPO
Model Scale=Qwen3-4B
2026.01
78.06
1,632.52
LLM-as-Judge
Model Scale=Qwen3-4B
2026.01
65.21
3,522.18
Latent-GRPO
Model Scale=Qwen3-1.7B
2026.01
64.82
1,218.92
LLM-as-Judge
Model Scale=Qwen3-1.7B
2026.01
56.69
2,573.41
Latent-GRPO
Model Scale=Qwen3-0.6B
2026.01
40.56
1,079.27
LLM-as-Judge
Model Scale=Qwen3-0.6B
2026.01
34.45
1,937.82
Feedback
Search any
task
Search any
task