Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Overall Generation Quality on Generative QA Protocol Overall
Loading...
28.6
ROUGE-L
UAV
5.512
11.506
17.5
23.494
May 25, 2026
ROUGE-L
BERTScore
Updated 8d ago
Evaluation Results
Method
Method
Links
ROUGE-L
BERTScore
UAV
Donor Model=Llama-3.1-...
2026.05
28.6
0.379
LatentQA
Donor Model=Llama-3.1-...
2026.05
28.5
0.37
UAV
Donor Model=Llama-3.1-...
2026.05
27.4
0.369
AO
Donor Model=Llama-3.1-...
2026.05
27.3
0.362
UAV
Donor Model=Qwen3-4B-I...
2026.05
26
0.347
UAV
Donor Model=Llama-3.1-...
2026.05
25.7
0.346
UAV
Donor Model=Qwen3-4B-I...
2026.05
25.4
0.344
LatentQA
Donor Model=Qwen3-4B-I...
2026.05
23.5
0.327
AO
Donor Model=Qwen3-4B-I...
2026.05
19.8
0.234
PatchScope
Donor Model=Llama-3.1-...
2026.05
7.1
-0.147
SelfIE
Donor Model=Llama-3.1-...
2026.05
6.6
-0.136
PatchScope
Donor Model=Qwen3-4B-I...
2026.05
6.5
-0.19
SelfIE
Donor Model=Qwen3-4B-I...
2026.05
6.4
-0.199
Feedback
Search any
task
Search any
task