Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-task Evaluation on Average (test)
Loading...
52.45
Hit Score
NA-SFT
36.9332
40.9616
44.99
49.0184
May 28, 2024
Hit Score
False Alarm Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Hit Score
False Alarm Rate
NA-SFT
Backbone=Mistral-7B
2024.05
52.45
68.65
SFT
Backbone=Mistral-7B
2024.05
47.25
63.94
BSO
Backbone=Mistral-7B
2024.05
43.63
62.97
Vlguard
Backbone=Mistral-7B
2024.05
42.85
62.81
Vaccine-SFT
Backbone=Mistral-7B
2024.05
42.25
46.76
Lisa
Backbone=Mistral-7B
2024.05
37.53
63.74
Feedback
Search any
task
Search any
task