Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Generative QA Protocol Fact Retrieval
Loading...
29.5
ROUGE-L
UAV
0.588
8.094
15.6
23.106
May 25, 2026
ROUGE-L
BERTScore
Updated 8d ago
Evaluation Results
Method
Method
Links
ROUGE-L
BERTScore
UAV
Donor Model=Llama-3.1-...
2026.05
29.5
0.421
LatentQA
Donor Model=Llama-3.1-...
2026.05
28
0.408
UAV
Donor Model=Llama-3.1-...
2026.05
26.5
0.402
AO
Donor Model=Llama-3.1-...
2026.05
26.3
0.397
UAV
Donor Model=Llama-3.1-...
2026.05
24
0.381
UAV
Donor Model=Qwen3-4B-I...
2026.05
23.9
0.372
UAV
Donor Model=Qwen3-4B-I...
2026.05
23.3
0.374
LatentQA
Donor Model=Qwen3-4B-I...
2026.05
20
0.349
AO
Donor Model=Qwen3-4B-I...
2026.05
12.9
0.158
SelfIE
Donor Model=Llama-3.1-...
2026.05
2.2
-0.285
SelfIE
Donor Model=Qwen3-4B-I...
2026.05
1.9
-0.343
PatchScope
Donor Model=Qwen3-4B-I...
2026.05
1.8
-0.34
PatchScope
Donor Model=Llama-3.1-...
2026.05
1.7
-0.322
Feedback
Search any
task
Search any
task