Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-form factuality evaluation on FactBench
Loading...
84.4
Accuracy
CURE
73.168
76.084
79
81.916
Apr 13, 2026
Accuracy
ECE
Brier Score
AUROC
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
ECE
Brier Score
AUROC
CURE
Pipeline Stage=Factual...
2026.04
84.4
0.136
0.195
0.667
SFT + F-RL + CO
Pipeline Stage=Calibra...
2026.04
82.5
0.159
0.214
0.648
SFT
Pipeline Stage=Supervi...
2026.04
82
0.137
0.184
0.563
SFT + F-RL
Pipeline Stage=Feasibi...
2026.04
78.1
0.157
0.2
0.605
L2RF
2026.04
77.1
-
-
-
Base LLM
Backbone=Llama3.1-8B-I...
2026.04
73.6
0.162
0.219
0.541
LitCab
2026.04
73.6
0.212
0.276
0.575
Feedback
Search any
task
Search any
task