Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Method Ranking Self-Consistency on Combined benchmark M=120 questions

0.8925Mean Kendall's Tau-b

nanson_rank_ties_average

0.760420.794710.8290.86329Mar 11, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
0.89250.0497
2026.03
0.88310.037
2026.03
0.86690.0589
2026.03
0.86640.0492
2026.03
0.86540.0489
2026.03
0.86540.0489
2026.03
0.86540.0489
2026.03
0.86540.0489
2026.03
0.86540.0489
2026.03
0.86540.0489
2026.03
0.86540.0489
2026.03
0.86480.0417
2026.03
0.86470.0486
2026.03
0.86470.0486
2026.03
0.86470.0486
2026.03
0.86470.0486
2026.03
0.86470.0486
2026.03
0.86470.0486
2026.03
0.86470.0486
2026.03
0.86470.0486
2026.03
0.86460.0486
2026.03
0.86460.0499
2026.03
0.86230.0491
2026.03
0.80740.0507
2026.03
0.80640.0309
2026.03
0.80630.0507
2026.03
0.79630.0454
2026.03
0.79630.0454
2026.03
0.79630.0454
2026.03
0.76550.0455