Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Item-level Evaluation on Item-level QA dataset
Loading...
62.4
ROUGE-1
LENS
14.352
26.826
39.3
51.774
Dec 28, 2025
ROUGE-1
ROUGE-2
ROUGE-L
BLEU-4
METEOR
BERTScore
Alignment Score
Updated 3mo ago
Evaluation Results
Method
Method
Links
ROUGE-1
ROUGE-2
ROUGE-L
BLEU-4
METEOR
BERTScore
Alignment Score
LENS
Model=Qwen2.5-14B-Inst...
2025.12
62.4
46
60.3
38.3
61.1
83.2
73.2
TS-Text
Model=Qwen2.5-14B-Inst...
2025.12
17.6
4.7
14.2
1.7
21.8
60.4
66.5
TS-Image
Model=Qwen2.5-VL-32B
2025.12
16.2
4.4
13.6
1.6
21.1
60.2
68.7
Feedback
Search any
task
Search any
task