Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Dolly

Benchmarks

Task NameDataset NameSOTA ResultTrend
Watermark Detectiondolly_cw
Accuracy100
48
Instruction FollowingDolly Eval (test)
ROUGE-L29.69
42
Question AnsweringDolly Closed QA
ASR100
36
Hallucination detectionDolly AC (test)
AUC81.59
33
Instruction FollowingDolly
Rouge-L27.47
32
Detection Accuracydolly_cw
Accuracy99.27
24
Instruction FollowingDolly
SBERT Similarity71.4
24
Instruction TuningDolly-15K alpha=5.0
Rouge-L35.79
22
Instruction TuningDolly-15K alpha=0.5
Rouge-L35.48
22
Instruction-tuningDolly
RougeL35.34
21
Hallucination DetectionDolly Llama2-13B (test)
Accuracy75.76
21
Hallucination DetectionDolly Llama2-7B (test)
Acc77.78
21
Scrubbing AttackDolly
AUC80
20
Hallucination DetectionDolly AC LLaMA3-8B
Recall83.92
19
Hallucination DetectionDolly AC LLaMA2-13B
Recall0.9741
19
Hallucination DetectionDolly AC LLaMA2-7B
Recall87.28
19
Instruction FollowingDolly Eval
A Win Count62
19
Spoofing Attack DetectionDolly CW
WCS8.88
18
Instruction FollowingDolly
Score71.3
18
Instruction Following EvaluationDolly Out-of-Distribution
GPT-4o Score49.9
17
Language GenerationDolly databricks 15k (test)
ROUGE-L29.7
14
Machine UnlearningDolly-15k Mistral-7B variant (Seen)
Seen ASR86
14
Data ExtractionDolly D2
Mean Match Ratio49.2
11
Fine-tuning RobustnessDolly Dataset
FSR92
10
Federated LearningDolly-15K
Speedup18.86
10
Showing 25 of 43 rows