Vicuna

Benchmarks

Task Name	Dataset Name	SOTA Result
Instruction Following	Vicuna	Rouge-L20.93	101
Watermark Detection	Vicuna-7b 16k 50 samples v1.5	AUROC (Overall)0.986	94
Adversarial Jailbreak Attack	Vicuna 7B	Attack Success Rate (ASR)98.46	58
Adversarial Jailbreak Attack	Vicuna 13B	Attack Success Rate (ASR)98.65	55
Instruction Following	Vicuna Eval (test)	ROUGE-L20.37	36
Watermark Attack Robustness	Vicuna 7b 16k v1.5 (test)	ASR62	30
Instruction Following	Vicuna	SBERT Similarity73.6	24
Instruction Following	Vicuna benchmark zero-shot	Pairwise Score (ChatGPT vs Sys)119.4	21
LLM-as-a-Judge Evaluation	Vicuna Benchmark	Pearson Correlation (r)65.1	20
Instruction Tuning	Vicuna	RougeL Score18.73	19
Instruction Following	Vicuna Eval	Win Rate (A)66.3	19
Open-ended generation	Vicuna	Skywork Reward V2 Score99.1	18
Hallucination Detection	SC-Vicuna	AUROC71.4	18
Instruction Following	Vicuna benchmark	GPT-4 Evaluation Score8.09	18
Instruction Following	Vicuna	Score58.2	18
Instruction Following Evaluation	Vicuna Out-of-Distribution	GPT-4o Score51.9	17
Dialogue Generation	Vicuna	Rouge-L15.05	16
Human alignment evaluation	Vicuna Evaluation Benchmark	Accuracy76.3	16
Response generation	Vicuna 80 prompts (test)	Elo1,348	16
Watermark Evasion	vicuna-7b 50 samples, UMD watermarking v1.5-16k (test)	ASR (0 Unattacked)58	15
Language Generation	Vicuna (test)	ROUGE-L19.4	14
Output Equivalence	Vicuna	Exact Match97.3	13
Instruction Following	Vicuna-bench	Score8.24	13
Instruction Following	Vicuna Eval	ROUGE-L16.31	11
Language Instruction Following	Vicuna-80 v1 (test)	Score85.6	10

Showing 25 of 41 rows