Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scenario-based Filter Generation Benchmark
Loading...
18.98
ROUGE-1
llama 3.2 3B
9.8384
12.2117
14.585
16.9583
Nov 17, 2025
ROUGE-1
ROUGE-2
ROUGE-L
SEM Score
LLM Judge Score
SacreBLEU
Updated 1mo ago
Evaluation Results
Method
Method
Links
ROUGE-1
ROUGE-2
ROUGE-L
SEM Score
LLM Judge Score
SacreBLEU
llama 3.2 3B
Model=llama 3.2 3B
2025.11
18.98
7.6
13.98
88.38
61.03
3.49
phi-4-mini 3.8B
Model=phi-4-mini 3.8B
2025.11
18.67
7.87
13.54
88.97
66.63
4.07
gpt-4o-mini
Model=gpt-4o-mini
2025.11
17.87
7.46
12.73
89.5
73.53
3.28
Gemma3 4B
Model=Gemma3 4B
2025.11
10.19
3.67
7.15
87.63
61.53
1.33
Feedback
Search any
task
Search any
task