MIRAGE

Benchmarks

Task Name	Dataset Name	SOTA Result
AI-Generated Image Detection	Mirage (test)	Human Overall Accuracy99.18	14
Coarse-level Multimodal Misinformation Detection	MiRAGe News	Accuracy80.2	14
Medical Question Answering	MIRAGE (test)	MMLU-Med89.44	12
Biomedical Retrieval-Augmented Generation	Mirage	MMLU-med Accuracy87.24	10
Flicker-banding and Moire Removal	MIRAGE cropped (test)	SSIM0.7354	9
AI-generated text detection	MIRAGE six task subsets	AUROC0.963	5
GUI Agent Attack Success Rate Evaluation	MIRAGE (1,111-sample main set)	FB Success Rate41	5
Multi-modal Forgery Detection	MiRAGe	Accuracy53.92	5
Binary forgery detection	MiRAGe	Accuracy56.99	5
Multi-choice	MIRAGE	Accuracy58.3	2
Dataset Diversity and Coverage Evaluation	MIRAGE 3-app overlap	Goal-Text Entropy0.918	1
Dataset Diversity and Coverage Evaluation	MIRAGE matched-n	Goal-text Entropy0.927	1
Dataset Diversity and Coverage Evaluation	MIRAGE full	Goal-text Entropy0.933	1
Data source relevance classification	MIRAGE (test)	Accuracy86.63	1

Showing 14 of 14 rows