Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-form Factuality Calibration on FactScore
Loading...
0.076
ECE
DINCO
0.05796
0.17973
0.3015
0.42327
Sep 29, 2025
ECE
Calibration BS
AUC
Pearson Correlation (r)
Spearman Correlation (rho)
Updated 1mo ago
Evaluation Results
Method
Method
Links
ECE
Calibration BS
AUC
Pearson Correlation (r)
Spearman Correlation (rho)
DINCO
Model=Qwen3-8B
2025.09
0.076
0.202
0.767
0.518
0.538
NVC
Model=Gemma-3-4B-IT
2025.09
0.123
0.23
0.726
0.695
0.704
SC
Model=Qwen3-8B
2025.09
0.162
0.226
0.771
0.468
0.494
DINCO
Model=Gemma-3-4B-IT
2025.09
0.172
0.21
0.793
0.724
0.712
NVC
Model=Qwen3-8B
2025.09
0.191
0.263
0.681
0.444
0.443
SC
Model=Gemma-3-4B-IT
2025.09
0.197
0.233
0.787
0.629
0.607
VC
Model=Qwen3-8B
2025.09
0.433
0.431
0.625
0.073
0.122
VC
Model=Gemma-3-4B-IT
2025.09
0.527
0.527
0.683
-0.081
-0.129
Feedback
Search any
task
Search any
task