Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on GSM8K (test) (Acc, Time)

82.34Accuracy

Latent-GRPO

Updated 4mo ago

Evaluation Results

Method	Links
Latent-GRPO 2026.01		82.34	658.21
Rule-based 2026.01		79.87	651.45
Latent-GRPO 2026.01		73.88	492.34
LLM-as-Judge 2026.01		72.12	1,411.72
Rule-based 2026.01		71.55	488.63
LLM-as-Judge 2026.01		64.2	1,032.55
Latent-GRPO 2026.01		61.25	431.18
Rule-based 2026.01		58.41	434.61
LLM-as-Judge 2026.01		53.52	768.42