Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Factual Consistency Evaluation on 2,055 code summary evaluations
Loading...
0
Time (s)
ROUGE-1
-0.4096
2.3552
5.12
7.8848
Apr 12, 2026
Time (s)
Cost
Updated 4d ago
Evaluation Results
Method
Method
Links
Time (s)
Cost
ROUGE-1
2026.04
0
0
ROUGE-2
2026.04
0
0
ROUGE-L
2026.04
0
0
METEOR
2026.04
0.02
0
SBCS
2026.04
0.02
0
BERTScore
2026.04
0.03
0
SBED
2026.04
0.03
0
SIDE
2026.04
0.08
0
LLM-judge
2026.04
0.34
0.0002
CODERPE
2026.04
0.53
0.0002
G-Eval
2026.04
1.11
0.0002
BLEU
2026.04
1.12
0
Factscore
2026.04
9.18
0.0058
ReFEree
2026.04
10.24
0.0042
Feedback
Search any
task
Search any
task