Share your thoughts, 1 month free Claude Pro on usSee more

Multistep Reasoning on SpokenMQA

81.5Accuracy

Baseline (Thinking)

Updated 4mo ago

Evaluation Results

Method	Links
Baseline (Thinking) 2025.12		81.5	227.16	227.15
AsyncReasoning 2025.12		81	0.84	0.8
Baseline (No think) 2025.12		80.6	0.8	0.7
Baseline (Thinking) 2025.12		70.4	201.53	201.52
AsyncReasoning 2025.12		65.7	2.56	1.37
Baseline (No think) 2025.12		56.8	0.72	0.67