Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-form QA on LiveQA (test)
Loading...
27.33
ROUGE-1
GPT-4o + MedBioRAG
15.266
18.398
21.53
24.662
Dec 10, 2025
ROUGE-1
ROUGE-2
ROUGE-L
BLEU
BERTScore
BLEURT
Updated 4d ago
Evaluation Results
Method
Method
Links
ROUGE-1
ROUGE-2
ROUGE-L
BLEU
BERTScore
BLEURT
GPT-4o + MedBioRAG
Fine-tuned=false, MedB...
2025.12
27.33
6.39
13.42
15.29
-1.6
-29.99
GPT-4o
Fine-tuned=false, MedB...
2025.12
26.96
5.8
13.42
1.41
-2.93
-34.79
Fine-Tuned GPT-4o
Fine-tuned=true, MedBi...
2025.12
24.12
6.18
13.31
1.63
1.1
-46.48
Fine-Tuned GPT-4o + MedBioRAG
Fine-tuned=true, MedBi...
2025.12
15.73
4.58
10.74
1.2
2.29
-86.99
Feedback
Search any
task
Search any
task