Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Single-turn Reasoning on AIME, GPQA, MMLU-Pro, ToolBench Aggregate

65Average Score

ReMem

35.8843.445158.56Nov 25, 2025
Updated 15d ago

Evaluation Results

MethodLinks
2025.11
65
2025.11
63
2025.11
60
2025.11
59
2025.11
59
2025.11
59
2025.11
59
2025.11
59
2025.11
58
2025.11
58
2025.11
58
2025.11
58
2025.11
57
2025.11
56
2025.11
56
2025.11
56
2025.11
56
2025.11
55
2025.11
55
2025.11
55
2025.11
55
2025.11
54
2025.11
54
2025.11
52
2025.11
52
2025.11
49
2025.11
48
2025.11
37