Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Stylized Dialogue on 148-query Average across 9 styles (test)
Loading...
4.463
Context Relevance
SFR
4.3642
4.38985
4.4155
4.44115
May 27, 2026
Context Relevance
Relation Score
Style Score
Fluency
Updated 6d ago
Evaluation Results
Method
Method
Links
Context Relevance
Relation Score
Style Score
Fluency
SFR
Temperature=0.8, Backb...
2026.05
4.463
4.666
4.401
4.881
SFT
Temperature=0.8, Backb...
2026.05
4.427
4.661
4.243
4.86
DeepSeek-R1-prompt
Temperature=0.8, Backb...
2026.05
4.368
4.604
4.329
4.811
Feedback
Search any
task
Search any
task