Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Question Answering Utility Evaluation on CAPID (test)
Loading...
79
GPT-4 Score
Llama-3.1-8B (FT)
49.88
57.44
65
72.56
Feb 10, 2026
GPT-4 Score
Claude Score
Updated 4d ago
Evaluation Results
Method
Method
Links
GPT-4 Score
Claude Score
Llama-3.1-8B (FT)
Training=Fine-tuned, B...
2026.02
79
73
Llama-3.1-8B (Ngong et al., 2025)
Backbone=Llama-3.1-8B,...
2026.02
51
43
Feedback
Search any
task
Search any
task