Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

WIKIBIO

Benchmarks

Task NameDataset NameSOTA ResultTrend
Faithfulness evaluationWikiBio
AUC π-Soft-NS0.438
27
Data-to-Text GenerationWikiBio (test)
BLEU45.14
17
Knowledge modificationWikiBio
Edit Success Rate100
15
Hallucination self-detectionWikiBio GPT-4o
Accuracy85
12
Text GenerationWikiBIO
BLEU9.68
11
Sentence-Level Confidence PredictionWikiBio
AUROC68.6
10
Table-to-text generationWIKIBIO (test)
BLEU-444.89
10
Hallucination DetectionWikiBio GPT-3.5-Turbo-Instruct (test)
AUC-PR (Nonfactual)92.5
8
Data-to-text generationWikiBio 22
BLEU47.17
7
Knowledge EditingWikiBio (test)
RwA84.33
6
Table-to-Text GenerationWikiBio (val)
Fluency99.6
4
Hallucination DetectionWikiBio
Metric-
0
Faithfulness evaluationWikiBio (test)
AUC π-Soft-NS-
0
Showing 13 of 13 rows