Explanation quality evaluation

Benchmarks

Dataset Name	SOTA Method	Metric
Chest X Pneumothorax (test)	ViT-L/16	Relevance Rank Accuracy (FeatPerm)2.2	11	2mo ago
Oxford-IIIT Pet (test)	ResNet-50	Rank Acc (FeatPerm)37.2	11	2mo ago
COVID-Qu-Ex (test)	DenseNet-169	RRA (FeatPerm)37.7	11	2mo ago
Deepfake Detection Dataset DDIM, PixArt, SD, SiT, StyleGAN	PRPO	CAC4.42	9	2mo ago
LIAR RAW		Meaningfulness Score2.29	7	3mo ago
RAW-FC		M Score2.07	7	3mo ago
LIAR-RAW (test)		ChatGPT Meaningfulness Score1.53	7	3mo ago
Synthetic (test)	Qwen3-VL-8b-SVR-FT	Helpfulness87.6	6	5mo ago
In-house Dataset	Qwen3-VL-8b-SVR-FT	Helpfulness80.8	6	5mo ago
DFD 100 randomly selected samples (test)	VRAG-DFD	GPT-4o Score7.55	3	3mo ago
MMLU-CK (test)	PubMed Reasoner	Reasoning Soundness Loss (%)44	2	3mo ago
PubMedQA (test)	PubMed Reasoner	Reasoning Soundness Loss39.7	2	3mo ago

Showing 12 of 12 rows