Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Explanation quality evaluation on In-house Dataset
Loading...
80.8
Helpfulness
Qwen3-VL-8b-SVR-FT
38.472
49.461
60.45
71.439
Dec 11, 2025
Helpfulness
Accuracy
Relevance
Average Score
Expert Rating (1-10)
Updated 4d ago
Evaluation Results
Method
Method
Links
Helpfulness
Accuracy
Relevance
Average Score
Expert Rating (1-10)
Qwen3-VL-8b-SVR-FT
Training=SVR-FT
2025.12
80.8
70.6
91.6
81
6.985
GPT-5-mini
Model Type=Base
2025.12
76.2
68.7
82.9
75.9
4.915
Qwen3-VL-8b-GFT
Training=GFT
2025.12
71.3
67.9
74.2
71.1
6.445
Gemini-2.5-flash
Model Type=Base
2025.12
69.7
52.1
78.3
66.7
5.86
Qwen3-VL-8b
Model Type=Base
2025.12
56
36.8
75.5
56.1
2.44
Qwen3-VL-8b-FT
Training=Fine-tuned
2025.12
40.1
37.1
59
45.4
2.178
Feedback
Search any
task
Search any
task