Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Dolly

Benchmarks

Task NameDataset NameSOTA ResultTrend
Watermark Detectiondolly_cw
Accuracy100
48
Hallucination detectionDolly AC (test)
AUC81.59
33
Detection Accuracydolly_cw
Accuracy99.27
24
Instruction FollowingDolly
SBERT Similarity71.4
24
Instruction TuningDolly-15K alpha=5.0
Rouge-L35.79
22
Instruction TuningDolly-15K alpha=0.5
Rouge-L35.48
22
Hallucination DetectionDolly Llama2-13B (test)
Accuracy75.76
21
Hallucination DetectionDolly Llama2-7B (test)
Acc77.78
21
Hallucination DetectionDolly AC LLaMA3-8B
Recall83.92
19
Hallucination DetectionDolly AC LLaMA2-13B
Recall0.9741
19
Hallucination DetectionDolly AC LLaMA2-7B
Recall87.28
19
Instruction FollowingDolly Eval
A Win Count62
19
Instruction FollowingDolly
Score71.3
18
Machine UnlearningDolly-15k Mistral-7B variant (Seen)
Seen ASR86
14
Fine-tuning RobustnessDolly Dataset
FSR92
10
Federated LearningDolly-15K
Speedup18.86
10
Machine UnlearningDolly-15k OOD triggers 1.0 (test)
OOD ASR47.2
7
Machine UnlearningDolly-15k Mistral-7B variant (OOD)
OOD ASR27.7
7
Machine UnlearningDolly-15k Clean Mistral-7B variant (val)
Clean PPL19.7
7
Open-ended instruction followingDolly Eval
A Win Rate54
7
Watermark SpoofingDolly CW
TPR @ FPR=10%67
6
LLM-rated generation qualityDolly
Correctness4.1
6
Instruction FollowingDolly
Rouge-L25.2
6
Hallucination DetectionDolly-15k Qwen2.5-7B (test)
Precision84.21
6
Hallucination DetectionDolly-15k Qwen2.5-3B (test)
Precision80.55
6
Showing 25 of 32 rows