Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-turn dialogue evaluation on BotChat
Loading...
86.3
Success Rate (N=16)
ISM
55.828
63.739
71.65
79.561
Aug 1, 2024
Success Rate (N=16)
Success Rate (N=8)
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate (N=16)
Success Rate (N=8)
ISM
Backbone=Qwen2.5-14B
2024.08
86.3
94
SFT
Backbone=Qwen2.5-14B
2024.08
85.6
93.4
ISM
Backbone=LLaMA3.1-8B
2024.08
79.2
93.6
ISM
Backbone=Qwen2.5-7B
2024.08
78.2
92
SFT
Backbone=LLaMA3.1-8B
2024.08
77.3
90.5
SFT
Backbone=Qwen2.5-7B
2024.08
71.8
89.4
Direct
Backbone=LLaMA3.1-8B
2024.08
67.3
89.8
Direct
Backbone=Qwen2.5-7B
2024.08
57.6
90.1
Direct
Backbone=Qwen2.5-14B
2024.08
57
87.2
Feedback
Search any
task
Search any
task