Share your thoughts, 1 month free Claude Pro on usSee more

General Reasoning Average on Aggregate (OBQA, CSQA, SIQA, ARC, MMLU, GSM8K-MC, AQUA)

86.21Average Accuracy

IoT

Updated 3mo ago

Evaluation Results

Method	Links
IoT 2026.03		86.21
CoT 2026.03		85.23
IoT 2026.03		80.69
SC 2026.03		80.04
CoT 2026.03		78.75
IoT 2026.03		77.35
EoT 2026.03		76.13
CoT 2026.03		75.38
IoT 2026.03		75.22
SC 2026.03		74.89
SC 2026.03		74.81
EoT 2026.03		72.38
CoT 2026.03		72.28
EoT 2026.03		71.93