Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Alpaca

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingAlpaca
Speedup (x)5.27
173
Language ModelingAlpaca
Perplexity3.22
61
LLM InferenceAlpaca
Speedup5.56
57
Detection EfficiencyAlpaca OnlyTarget Long (malicious)
ATGR8.378
56
Detection EfficiencyAlpaca OnlyTarget Long (benign)
ATGR6.115
56
Targeted Attack DetectionAlpaca OnlyTarget Short
TPR100
56
Targeted Attack DetectionAlpaca OnlyTarget Medium
TPR100
56
Instruction FollowingAlpaca
Average Accepted Length6.27
51
Offline Synthetic Data GenerationAlpaca
Generation Time (s)68.8
44
Safety EvaluationAlpaca
HRR0
38
Instruction FollowingAlpaca SFT Small Prompts
Self-BLEU46.6
36
Creative WritingAlpaca SFT Short Stories
Self-BLEU (Diversity)3.7
36
Instruction following and safety evaluationAlpaca
BRT Score52.2
36
Targeted Attack DetectionAlpaca AddTarget Medium
TPR100
35
Instruction FollowingAlpaca clean (test)
F1 Score79.81
32
Long-form GenerationAlpaca
Perplexity (PPL)2.4268
30
Instruction FollowingAlpaca poisoned (test)
F1 Score99.93
28
Chatbot workloadAlpaca
Average PTLA (s/token)0.3
28
Prompt injection attack detectionAlpaca
TPR100
28
Answer AccuracyAlpaca
BRT Accuracy40.6
26
MMLU EvaluationAlpaca
Accuracy32.26
24
LLM UnlearningVirtual-Alpaca
Forget Rate44
24
Instruction FollowingAlpaca Finance
Average Length2.78
22
Instruction FollowingAlpaca (test)
SR Score3.15
21
Safety defense against harmful fine-tuning attacksAlpaca harmful subset (test)
Harmful Score26.6
21
Showing 25 of 78 rows