Dolly

Benchmarks

Task Name	Dataset Name	SOTA Result
Instruction Following	Dolly	Rouge-L27.74	50
Watermark Detection	dolly_cw	Accuracy100	48
Instruction Following	Dolly Eval (test)	ROUGE-L29.69	42
Question Answering	Dolly Closed QA	ASR100	36
Instruction Following	Dolly	Score71.3	36
Hallucination detection	Dolly AC (test)	AUC81.59	33
General Question Answering & Instruction Following	Dolly	MP Score22	24
Data Leakage Attack	Dolly	AP (alpha=0.5)98.5	24
MMLU Evaluation	Dolly	Accuracy30.94	24
Detection Accuracy	dolly_cw	Accuracy99.27	24
Instruction Following	Dolly	SBERT Similarity71.4	24
Instruction Tuning	Dolly-15K alpha=5.0	Rouge-L35.79	22
Instruction Tuning	Dolly-15K alpha=0.5	Rouge-L35.48	22
Instruction-tuning	Dolly	RougeL35.34	21
Hallucination Detection	Dolly Llama2-13B (test)	Accuracy75.76	21
Hallucination Detection	Dolly Llama2-7B (test)	Acc77.78	21
Scrubbing Attack	Dolly	AUC80	20
Hallucination Detection	Dolly AC LLaMA3-8B	Recall83.92	19
Hallucination Detection	Dolly AC LLaMA2-13B	Recall0.9741	19
Hallucination Detection	Dolly AC LLaMA2-7B	Recall87.28	19
Instruction Following	Dolly Eval	A Win Count62	19
Open-ended generation	Dolly	Skywork Reward V2 Score0.961	18
Spoofing Attack Detection	Dolly CW	WCS8.88	18
Instruction Following Evaluation	Dolly Out-of-Distribution	GPT-4o Score49.9	17
Language Generation	Dolly databricks 15k (test)	ROUGE-L29.7	14

Showing 25 of 50 rows