Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Explanation quality evaluation on Synthetic (test)
Loading...
87.6
Helpfulness
Qwen3-VL-8b-SVR-FT
41.632
53.566
65.5
77.434
Dec 11, 2025
Helpfulness
Accuracy
Relevance
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Helpfulness
Accuracy
Relevance
Average Score
Qwen3-VL-8b-SVR-FT
Training=SVR-FT
2025.12
87.6
83.9
95.3
89
GPT-5-mini
Model Type=Base
2025.12
73.4
65.1
79.2
72.6
Qwen3-VL-8b-GFT
Training=GFT
2025.12
72.5
69.4
75.1
72.3
Qwen3-VL-8b
Model Type=Base
2025.12
58.2
45.2
75.2
59.5
Gemini-2.5-flash
Model Type=Base
2025.12
46.8
37.9
53.7
46.1
Qwen3-VL-8b-FT
Training=Fine-tuned
2025.12
43.4
47.8
57.2
49.5
Feedback
Search any
task
Search any
task