Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Vicuna

Benchmarks

Task NameDataset NameSOTA ResultTrend
Watermark DetectionVicuna-7b 16k 50 samples v1.5
AUROC (Overall)0.986
94
Watermark Attack RobustnessVicuna 7b 16k v1.5 (test)
ASR62
30
Instruction FollowingVicuna
SBERT Similarity73.6
24
Instruction FollowingVicuna benchmark zero-shot
Pairwise Score (ChatGPT vs Sys)119.4
21
Instruction FollowingVicuna Eval
Win Rate (A)66.3
19
Instruction FollowingVicuna benchmark
GPT-4 Evaluation Score8.09
18
Instruction FollowingVicuna
Score58.2
18
Human alignment evaluationVicuna Evaluation Benchmark
Accuracy76.3
16
Response generationVicuna 80 prompts (test)
Elo1,348
16
Watermark Evasionvicuna-7b 50 samples, UMD watermarking v1.5-16k (test)
ASR (0 Unattacked)58
15
Output EquivalenceVicuna
Exact Match97.3
13
Instruction FollowingVicuna-bench
Score8.24
13
Language Instruction FollowingVicuna-80 v1 (test)
Score85.6
10
Chatbot EvaluationVicuna benchmark
Elo Rating13,481
8
Computational Efficiency EvaluationVicuna
ATGR0.88
7
Open-ended instruction followingVicuna Eval v1.3 (test)
A Win Rate65
7
Instruction FollowingVicuna low-resource
Win Rate (bn)0.85
7
Instruction FollowingVicuna
Rouge-L17.8
6
Jailbreak AttackVicuna
ASR96.67
5
Instruction Following EvaluationVicuna Eval
Win Rate (A)63.8
5
Instruction FollowingVicuna (test)
Score A669
3
Showing 21 of 21 rows