Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Question Answering Utility Evaluation on Reddit (test)
Loading...
0.8
GPT-4 Score
Llama-3.1-8B (FT)
0.5712
0.6306
0.69
0.7494
Feb 10, 2026
GPT-4 Score
Claude Score
Updated 4d ago
Evaluation Results
Method
Method
Links
GPT-4 Score
Claude Score
Llama-3.1-8B (FT)
Training=Fine-tuned, B...
2026.02
0.8
0.79
Llama-3.1-8B (Ngong et al., 2025)
Backbone=Llama-3.1-8B,...
2026.02
0.58
0.48
Feedback
Search any
task
Search any
task