Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
RO reformulation on Random In-Distribution
Loading...
97.4
Accuracy
AutoREM
84.92
88.16
91.4
94.64
May 12, 2026
Accuracy
Output Token Count
Updated 21d ago
Evaluation Results
Method
Method
Links
Accuracy
Output Token Count
AutoREM
Base LLM=DeepSeek-V4-F...
2026.05
97.4
6,386
Expert Prompt
Base LLM=DeepSeek-V4-F...
2026.05
92.7
8,750
Max Thinking
Base LLM=DeepSeek-V4-F...
2026.05
90.6
13,857
Base LLM
Base LLM=DeepSeek-V4-F...
2026.05
87.5
7,777
ACE
Base LLM=DeepSeek-V4-F...
2026.05
87
4,419
ReasoningBank
Base LLM=DeepSeek-V4-F...
2026.05
85.4
6,575
Feedback
Search any
task
Search any
task