Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-form QA on PubMedQA (test)
Loading...
37.49
ROUGE-1
Fine-Tuned GPT-4o + MedBioRAG
25.2492
28.4271
31.605
34.7829
Dec 10, 2025
ROUGE-1
ROUGE-2
ROUGE-L
BLEU
BERTScore
BLEURT
Updated 4d ago
Evaluation Results
Method
Method
Links
ROUGE-1
ROUGE-2
ROUGE-L
BLEU
BERTScore
BLEURT
Fine-Tuned GPT-4o + MedBioRAG
Fine-tuned=true, MedBi...
2025.12
37.49
14.78
27.89
6.11
37.02
-3.89
Fine-Tuned GPT-4o
Fine-tuned=true, MedBi...
2025.12
35.82
13.55
26.09
4.34
35.33
-9.23
GPT-4o + MedBioRAG
Fine-tuned=false, MedB...
2025.12
26.39
9.55
17.47
2.73
18.1
-7.86
GPT-4o
Fine-tuned=false, MedB...
2025.12
25.72
9.02
17.05
2.48
17.04
-9.04
Feedback
Search any
task
Search any
task