Share your thoughts, 1 month free Claude Pro on usSee more

Reasoning on Omni Hard

18.58Accuracy

Cog-DRIFT

Updated 3mo ago

Evaluation Results

Method	Links
Cog-DRIFT 2026.04		18.58
GRPO 2026.04		16.37
Few-shot 2026.04		16.31
NuRL (Abstract) 2026.04		16.03
NuRL (Prefix) 2026.04		15.69
Zero-shot 2026.04		15.67
RFT 2026.04		15.13
NuRL (Prefix) 2026.04		7.19
Cog-DRIFT 2026.04		7.04
NuRL (Abstract) 2026.04		6.67
RFT 2026.04		6.52
Few-shot 2026.04		5.74
Zero-shot 2026.04		5.09
GRPO 2026.04		4.32