Our new X account is live! Follow @wizwand_team for updates

Groundedness on CAQA

0.68Kendall's Tau

Jury-on-Demand

Updated 4d ago

Evaluation Results

Method	Links
Jury-on-Demand 2025.12		0.68	0.03
GPT-OSS-20B 2025.12		0.6	0.02
GPT-OSS-120B 2025.12		0.6	0.03
Gemini 2.0 Flash 2025.12		0.59	0.03
Gemini 2.5 Flash 2025.12		0.59	0.03
Claude 3.7 2025.12		0.56	0.03
Gemini 2.5 Pro 2025.12		0.56	0.02
DeepSeek R1 2025.12		0.26	0.04
LLAMA 3.2 2025.12		0.08	0.06
Gemma 3 2025.12		0.02	0.02
Phi 4 2025.12		0.01	0.03