Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AMBER

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination EvaluationAMBER
F1 Score90.9
71
Hallucination AssessmentAMBER
CHAIR_s10.6
47
Hallucination AssessmentAMBER (test)
CHAIR5.6
38
Hallucination DetectionAMBER sampled 5k
A-ROC85.99
30
Generative HallucinationAMBER Generative
CHAIR Score8.4
24
Generative HallucinationAMBER generative subset
CHAIR10.9
22
WatermarkingAMBER
AUC99.99
18
Multimodal WatermarkingAMBER
PPL2.98
14
Multi-modal Hallucination EvaluationAMBER
Mean Accuracy76.9
10
Discriminative TaskAMBER Discrimination 1.0 (test)
Accuracy76.7
10
Next Token PredictionAmber 1.2T tokens
BPD4.28
4
Showing 11 of 11 rows