Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Conversational Response Generation on Bitext Retail Banking LLM Chatbot (test)
Loading...
26.85
BLEU
SFT Model
0.3508
7.2304
14.11
20.9896
Feb 10, 2026
BLEU
ROUGE-1
ROUGE-2
ROUGE-L
ChrF
BERTScore
Updated 4d ago
Evaluation Results
Method
Method
Links
BLEU
ROUGE-1
ROUGE-2
ROUGE-L
ChrF
BERTScore
SFT Model
Prompting=0-shot, Alig...
2026.02
26.85
58.34
32.81
41.28
52.1731
91.46
SFT Model
Prompting=2-shot, Alig...
2026.02
25.49
57.03
30.88
39.01
53.627
91.1
Base Model
Prompting=2-shot, Alig...
2026.02
2.73
34.4
9.8
17.84
37.0881
83.39
Base Model
Prompting=0-shot, Alig...
2026.02
2.7
34.18
9.08
17.87
35.1978
83.93
GPT-5
Prompting=2-shot
2026.02
1.37
36.33
8.06
18.69
33.6562
84.76
Feedback
Search any
task
Search any
task