Share your thoughts, 1 month free Claude Pro on usSee more

RO reformulation on Hard (Out-of-Distribution)

94.8Accuracy

AutoREM

Updated 2mo ago

Evaluation Results

Method	Links
AutoREM 2026.05		94.8	6,944
Max Thinking 2026.05		83.3	14,902
Expert Prompt 2026.05		83.3	7,549
ACE 2026.05		81.3	5,238
ReasoningBank 2026.05		80.2	8,089
Base LLM 2026.05		70.8	9,026