Our new X account is live! Follow @wizwand_team for updates

Groundedness on HaluEval

0.78Kendall's Tau

GPT-OSS-120B

Updated 4d ago

Evaluation Results

Method	Links
GPT-OSS-120B 2025.12		0.78	0.03
Jury-on-Demand 2025.12		0.77	0.02
GPT-OSS-20B 2025.12		0.76	0.03
Gemini 2.5 Flash 2025.12		0.74	0.04
Gemini 2.5 Pro 2025.12		0.73	0.03
Claude 3.7 2025.12		0.67	0.04
Gemini 2.0 Flash 2025.12		0.52	0.03
DeepSeek R1 2025.12		0.4	0.03
LLAMA 3.2 2025.12		0.2	0.05
Gemma 3 2025.12		0.17	0.04
Phi 4 2025.12		0.14	0.05