Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Truthfulness Evaluation on TruthfulQA

16.9Reliability Score

Aligner

-1.9242.9637.8512.737Feb 4, 2024
Updated 9d ago

Evaluation Results

MethodLinks
2024.02
16.9--
2024.02
13--
2024.02
11.8--
2024.02
10.8--
2024.02
10.3--
2024.02
10.3--
2024.02
10--
2024.02
9.1--
2024.02
7.6--
2024.02
7.1--
2024.02
6.6--
2024.02
6.1--
2024.02
5.4--
2024.02
5.1--
2024.02
4.9--
2024.02
4.2--
2024.02
3.9--
2024.02
3.2--
2024.02
2.7--
2024.02
2.7--
2024.02
2--
2024.02
1.7--
2024.02
1.5--
2024.02
1.2--
2024.02
1--
2024.02
0.7--
2024.02
0.7--
2024.02
0.5--
2024.02
0.5--
2024.02
0--
2024.02
-0.2--
2024.02
-0.5--
2024.02
-1.2--
2024.02
-31.6-
2024.02
-42.7-
2024.02
-27.2-
2024.02
-53-
2024.02
-26.3-
2024.02
-51.7-
2024.02
-41.2-
2024.02
-52-
2024.07
-44-
2024.07
-36-
2024.07
-41-
2024.07
-42-
2024.07
-45-
2025.05
-62100
2025.05
-6498
2025.05
-7499
2025.05
-7699
2025.10
-64.79-
2025.10
-66.93-
2025.10
-69.05-
2025.10
-65.58-
2025.10
-69.09-
2025.10
-70.02-