Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-turn Dialogue Reasoning on MultiChallenge
Loading...
32.97
Accuracy
TableLong
22.6844
25.3547
28.025
30.6953
Mar 23, 2026
Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
TableLong
Base Model=DS-R1-Disti...
2026.03
32.97
DS-R1-Distill-32B
Model=DS-R1-Distill-32B
2026.03
30.28
TableLong
Base Model=DS-R1-Disti...
2026.03
26.86
DS-R1-Distill-14B
Model=DS-R1-Distill-14B
2026.03
23.08
Feedback
Search any
task
Search any
task