Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Hallucination Diagnosis on Hallucination Diagnosis Dataset Wikipedia-based
Loading...
79.55
AS
Qwen3-32B
58.9788
64.3194
69.66
75.0006
Dec 31, 2025
AS
Accuracy
HR
SV
Updated 4d ago
Evaluation Results
Method
Method
Links
AS
Accuracy
HR
SV
Qwen3-32B
Evaluation Protocol=Pi...
2025.12
79.55
94.74
76.97
69.69
o4-mini
Evaluation Protocol=Pi...
2025.12
78.51
82.46
78.59
79.71
GPT 4.1
Evaluation Protocol=Pi...
2025.12
76.77
61.75
60.52
53.08
Qwen3-32B
Evaluation Protocol=Si...
2025.12
70.98
97.54
64.85
59.15
HDM-4B-RL
Evaluation Protocol=Si...
2025.12
69.16
92.28
58.65
48.49
GPT 4.1
Evaluation Protocol=Si...
2025.12
65.97
77.54
59.12
48.1
o4-mini
Evaluation Protocol=Si...
2025.12
61.18
89.12
41.3
43.04
Original Result
Description=Baseline f...
2025.12
59.77
-
-
-
Feedback
Search any
task
Search any
task