Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-turn Dialogue Evaluation on MT-Eval (Expansion, Follow-up, Recollection, Refinement)
Loading...
7.34
Expansion Score
SFT
3.5128
4.5064
5.5
6.4936
Aug 1, 2024
Expansion Score
Follow-up Score
Recollection Score
Refinement Score
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Expansion Score
Follow-up Score
Recollection Score
Refinement Score
Average Score
SFT
Backbone=Qwen2.5-14B
2024.08
7.34
8.1
6.24
5.68
6.84
ISM
Backbone=Qwen2.5-14B
2024.08
7.03
8.37
6.63
5.96
7
ISM
Backbone=LLaMA3.1-8B
2024.08
6.8
7.15
5.55
5.17
6.17
SFT
Backbone=Qwen2.5-7B
2024.08
6.63
7.71
5.63
5.29
6.32
ISM
Backbone=Qwen2.5-7B
2024.08
6.57
7.72
5.95
5.36
6.4
Direct
Backbone=Qwen2.5-14B
2024.08
6.43
7.55
5.32
4.31
5.9
SFT
Backbone=LLaMA3.1-8B
2024.08
6.21
7.1
5.13
5.03
5.87
Direct
Backbone=Qwen2.5-7B
2024.08
4.7
6.03
3.84
2.92
4.37
Direct
Backbone=LLaMA3.1-8B
2024.08
3.66
4.23
1.39
2.09
2.84
Feedback
Search any
task
Search any
task