Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model Ranking on TruthfulQA BERTScore (test)

0.45Kendall's Tau

Adaptive Multi-Model Ranking

0.17960.24980.320.3902Jan 20, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
0.45922.8
2026.01
0.19--