Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-form factuality evaluation on LongFact
Loading...
90.2
Accuracy
CURE
73.872
78.111
82.35
86.589
Apr 13, 2026
Accuracy
Expected Calibration Error (ECE)
Brier Score
Area Under ROC Curve (AUROC)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Expected Calibration Error (ECE)
Brier Score
Area Under ROC Curve (AUROC)
CURE
Pipeline Stage=Factual...
2026.04
90.2
14.4
16.8
66.9
SFT + F-RL + CO
Pipeline Stage=Calibra...
2026.04
89.8
14.6
16.9
65.6
SFT
Pipeline Stage=Supervi...
2026.04
85.1
11.8
14.6
65.5
SFT + F-RL
Pipeline Stage=Feasibi...
2026.04
83.7
13.5
15.5
59.8
L2RF
2026.04
79.4
-
-
-
Base LLM
Backbone=Llama3.1-8B-I...
2026.04
74.5
15.8
19.3
59.1
LitCab
2026.04
74.5
15.6
19
60.1
Feedback
Search any
task
Search any
task