Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Text Naturalness Evaluation on DeepSeek-R1 Experiment 2
Loading...
0.99
BERT Score
Template-based
0.5428
0.6589
0.775
0.8911
Apr 8, 2026
BERT Score
GPT-2 Score
RoBERTa Score
DistilBERT Score
Average Score
Updated 9d ago
Evaluation Results
Method
Method
Links
BERT Score
GPT-2 Score
RoBERTa Score
DistilBERT Score
Average Score
Template-based
Metric=detectability A...
2026.04
0.99
0.98
0.98
0.99
0.84
LLM-dependent
Metric=detectability A...
2026.04
0.63
0.65
0.64
0.68
0.6
LLM+ID
Metric=detectability A...
2026.04
0.61
0.63
0.62
0.66
0.59
LLM+CA
Metric=detectability A...
2026.04
0.6
0.62
0.61
0.65
0.58
iTAG
Metric=detectability A...
2026.04
0.56
0.58
0.57
0.61
0.55
Feedback
Search any
task
Search any
task