Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-form QA on BioASQ (test)
Loading...
34.3
ROUGE-1
Fine-Tuned GPT-4o + MedBioRAG
13.1568
18.6459
24.135
29.6241
Dec 10, 2025
ROUGE-1
ROUGE-2
ROUGE-L
BLEU
BERTScore
BLEURT
Updated 4d ago
Evaluation Results
Method
Method
Links
ROUGE-1
ROUGE-2
ROUGE-L
BLEU
BERTScore
BLEURT
Fine-Tuned GPT-4o + MedBioRAG
Fine-tuned=true, MedBi...
2025.12
34.3
18.81
27.74
6.12
35.43
-15.44
Fine-Tuned GPT-4o
Fine-tuned=true, MedBi...
2025.12
32.69
16.84
25.11
6.52
32.97
-2.41
GPT-4o + MedBioRAG
Fine-tuned=false, MedB...
2025.12
22.29
8.21
15.64
2.27
11.6
-12.5
GPT-4o
Fine-tuned=false, MedB...
2025.12
13.97
5.51
10.08
1.27
0.22
-24.84
Feedback
Search any
task
Search any
task