Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long Question Answering on MM-Telco Long QA
Loading...
41
ROUGE-1
GPT-4o
27.48
30.99
34.5
38.01
Nov 17, 2025
ROUGE-1
ROUGE-2
ROUGE-L
SEM Score
LLM Judge Score
sacreBLEU
Updated 1mo ago
Evaluation Results
Method
Method
Links
ROUGE-1
ROUGE-2
ROUGE-L
SEM Score
LLM Judge Score
sacreBLEU
GPT-4o
2025.11
41
15
23
91
75.39
10
Llama3.2 3B
Parameters=3B
2025.11
39
13
22
76
45.41
8.25
Phi 4 14B
Parameters=14B
2025.11
39
13
20
89
61.14
8.53
Llama3.1 8B
Parameters=8B
2025.11
39
12
20
89
60.2
8.29
Nemotron 70B
Parameters=70B
2025.11
38
12
20
90
72.04
6.58
Qwen2.5VL 7B
Parameters=7B
2025.11
28
11
18
88
43.23
3.4
Feedback
Search any
task
Search any
task