Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MIRAGE

Benchmarks

Task NameDataset NameSOTA ResultTrend
AI-Generated Image DetectionMirage (test)
Human Overall Accuracy99.18
14
Coarse-level Multimodal Misinformation DetectionMiRAGe News
Accuracy80.2
14
Medical Question AnsweringMIRAGE (test)
MMLU-Med89.44
12
Biomedical Retrieval-Augmented GenerationMirage
MMLU-med Accuracy87.24
10
Flicker-banding and Moire RemovalMIRAGE cropped (test)
SSIM0.7354
9
GUI Agent Attack Success Rate EvaluationMIRAGE (1,111-sample main set)
FB Success Rate41
5
Multi-modal Forgery DetectionMiRAGe
Accuracy53.92
5
Binary forgery detectionMiRAGe
Accuracy56.99
5
Multi-choiceMIRAGE
Accuracy58.3
2
Dataset Diversity and Coverage EvaluationMIRAGE 3-app overlap
Goal-Text Entropy0.918
1
Dataset Diversity and Coverage EvaluationMIRAGE matched-n
Goal-text Entropy0.927
1
Dataset Diversity and Coverage EvaluationMIRAGE full
Goal-text Entropy0.933
1
Data source relevance classificationMIRAGE (test)
Accuracy86.63
1
Showing 13 of 13 rows