Share your thoughts, 1 month free Claude Pro on usSee more

Reasoning Evaluation on DeepfakeJudge Reason 1.0 (test)

9BLEU-1

Qwen-3-VL-30B-Instruct

Updated 4mo ago

Evaluation Results

Method	Links
Qwen-3-VL-30B-Instruct 2026.02		9	3	3	36	6	20	18	62	3.31
InternVL3.5-GPT-OSS-20B-A4B 2026.02		8	3	3	34	6	20	17	60	2.79
Qwen-3-VL-2B-Instruct 2026.02		7	2	2	31	4	18	15	59	2.36
Microsoft-Phi-4-Multimodal-Instruct 2026.02		6	2	2	30	6	18	12	60	2.82
Gemini-2.5-Flash 2026.02		5	2	2	30	5	17	17	60	3.17
InternVL3.5-1B-HF 2026.02		5	2	2	27	5	17	15	56	2.44
Google-Gemma-3-12B 2026.02		5	2	2	29	5	18	12	60	2.7
Qwen2.5-VL-Gen-Buster 2026.02		5	2	2	26	3	15	15	57	2.33
Qwen-3-VL-235B-Instruct 2026.02		4	1	1	30	4	17	16	60	3.59
Qwen-3-VL-30B-Thinking 2026.02		3	1	1	26	3	15	15	59	3.21
Qwen-3-VL-235B-Thinking 2026.02		3	1	1	27	4	16	15	60	3.43
Qwen-3-VL-8B-Thinking 2026.02		2	1	1	25	3	14	13	58	2.81
ChatGPT-4o-mini 2026.02		1	1	1	15	1	8	5	35	2.83
Qwen-3-VL-4B-Instruct 2026.02		1	1	1	20	1	11	12	56	2.93
Qwen-3-VL-8B-Instruct 2026.02		1	1	1	16	1	10	9	53	2.51
SIDA-13B-Description 2026.02		1	1	1	24	3	16	15	58	2.32