Share your thoughts, 1 month free Claude Pro on usSee more

Multi-turn conversation performance on Math

94.5Avg Performance

Full

Updated 5mo ago

Evaluation Results

Method	Links
Full 2026.02		94.5	89.4
Full 2026.02		94	81.6
Full 2026.02		87.2	70.9
Experience-Driven Mediator 2026.02		86.3	67.3
Experience-Driven Mediator 2026.02		80.6	62
Sharded 2026.02		78.8	56.3
Experience-Driven Mediator 2026.02		77.7	70.4
Sharded 2026.02		69.6	48.6
Sharded 2026.02		64.9	45.6