Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Vicuna

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingVicuna
Rouge-L20.93
101
Watermark DetectionVicuna-7b 16k 50 samples v1.5
AUROC (Overall)0.986
94
Adversarial Jailbreak AttackVicuna 7B
Attack Success Rate (ASR)98.46
58
Adversarial Jailbreak AttackVicuna 13B
Attack Success Rate (ASR)98.65
55
Instruction FollowingVicuna Eval (test)
ROUGE-L20.37
36
Watermark Attack RobustnessVicuna 7b 16k v1.5 (test)
ASR62
30
Instruction FollowingVicuna
SBERT Similarity73.6
24
Instruction FollowingVicuna benchmark zero-shot
Pairwise Score (ChatGPT vs Sys)119.4
21
LLM-as-a-Judge EvaluationVicuna Benchmark
Pearson Correlation (r)65.1
20
Instruction TuningVicuna
RougeL Score18.73
19
Instruction FollowingVicuna Eval
Win Rate (A)66.3
19
Open-ended generationVicuna
Skywork Reward V2 Score99.1
18
Hallucination DetectionSC-Vicuna
AUROC71.4
18
Instruction FollowingVicuna benchmark
GPT-4 Evaluation Score8.09
18
Instruction FollowingVicuna
Score58.2
18
Instruction Following EvaluationVicuna Out-of-Distribution
GPT-4o Score51.9
17
Dialogue GenerationVicuna
Rouge-L15.05
16
Human alignment evaluationVicuna Evaluation Benchmark
Accuracy76.3
16
Response generationVicuna 80 prompts (test)
Elo1,348
16
Watermark Evasionvicuna-7b 50 samples, UMD watermarking v1.5-16k (test)
ASR (0 Unattacked)58
15
Language GenerationVicuna (test)
ROUGE-L19.4
14
Output EquivalenceVicuna
Exact Match97.3
13
Instruction FollowingVicuna-bench
Score8.24
13
Instruction FollowingVicuna Eval
ROUGE-L16.31
11
Language Instruction FollowingVicuna-80 v1 (test)
Score85.6
10
Showing 25 of 40 rows