Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Vicuna

Benchmarks

Task NameDataset NameSOTA ResultTrend
Watermark DetectionVicuna-7b 16k 50 samples v1.5
AUROC (Overall)0.986
94
Instruction FollowingVicuna
Rouge-L20.93
83
Instruction FollowingVicuna Eval (test)
ROUGE-L20.37
36
Watermark Attack RobustnessVicuna 7b 16k v1.5 (test)
ASR62
30
Instruction FollowingVicuna
SBERT Similarity73.6
24
Instruction FollowingVicuna benchmark zero-shot
Pairwise Score (ChatGPT vs Sys)119.4
21
LLM-as-a-Judge EvaluationVicuna Benchmark
Pearson Correlation (r)65.1
20
Instruction FollowingVicuna Eval
Win Rate (A)66.3
19
Hallucination DetectionSC-Vicuna
AUROC71.4
18
Instruction FollowingVicuna benchmark
GPT-4 Evaluation Score8.09
18
Instruction FollowingVicuna
Score58.2
18
Instruction Following EvaluationVicuna Out-of-Distribution
GPT-4o Score51.9
17
Human alignment evaluationVicuna Evaluation Benchmark
Accuracy76.3
16
Response generationVicuna 80 prompts (test)
Elo1,348
16
Watermark Evasionvicuna-7b 50 samples, UMD watermarking v1.5-16k (test)
ASR (0 Unattacked)58
15
Language GenerationVicuna (test)
ROUGE-L19.4
14
Output EquivalenceVicuna
Exact Match97.3
13
Instruction FollowingVicuna-bench
Score8.24
13
Instruction TuningVicuna
RougeL Score18.73
11
Instruction FollowingVicuna Eval
ROUGE-L16.31
11
Language Instruction FollowingVicuna-80 v1 (test)
Score85.6
10
Chatbot EvaluationVicuna benchmark
Elo Rating13,481
8
Model Fingerprinting RobustnessVicuna 1.5-7B
Similarity Score99.99
7
Computational Efficiency EvaluationVicuna
ATGR0.88
7
Open-ended instruction followingVicuna Eval v1.3 (test)
A Win Rate65
7
Showing 25 of 32 rows