Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Alpaca

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingAlpaca
Speedup (x)5.27
111
Language ModelingAlpaca
Perplexity3.22
61
Detection EfficiencyAlpaca OnlyTarget Long (malicious)
ATGR8.378
56
Detection EfficiencyAlpaca OnlyTarget Long (benign)
ATGR6.115
56
Targeted Attack DetectionAlpaca OnlyTarget Short
TPR100
56
Targeted Attack DetectionAlpaca OnlyTarget Medium
TPR100
56
Offline Synthetic Data GenerationAlpaca
Generation Time (s)68.8
44
Targeted Attack DetectionAlpaca AddTarget Medium
TPR100
35
Chatbot workloadAlpaca
Average PTLA (s/token)0.3
28
Prompt injection attack detectionAlpaca
TPR100
28
Instruction FollowingAlpaca Finance
Average Length2.78
22
LLM InferenceAlpaca
Speedup2.95
21
Safety defense against harmful fine-tuning attacksAlpaca harmful subset (test)
Harmful Score26.6
21
Instruction TuningAlpaca GPT4
Reasoning75.43
20
Conversational AbilityAlpaca (test)
Alpaca LC Win Rate71.87
20
Instruction TuningAlpaca instruction-tuning 52k
Pairwise Winning Score116
19
Abnormal Behavior DetectionAlpaca GPT4 (test)
Accuracy100
17
Long-form reasoningAlpaca
Avg LogProb per Answer-1.5772
14
Prompt RecoveryAlpaca
BLEU-143.24
14
Instruction FollowingAlpaca instruction-following (test)
PPL3.85
12
Faithfulness MeasurementAlpaca
BLEU0.601
12
Instruction FollowingAlpaca (test)
Kendall's Tau4.96
11
Bit-flip Inference Cost AttackAlpaca (test)
Avg Length (Original)1,117
10
Fine-tuning RobustnessAlpaca Dataset
FSR100
10
Attack Success RateAlpaca
ASR (Alpaca)0
8
Showing 25 of 52 rows