Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Direct Question Answering on 5% (forget set)
Loading...
36
ROUGE-L Score
NPO
2.72
11.36
20
28.64
Oct 20, 2025
ROUGE-L Score
LLM Judge Score
Updated 6d ago
Evaluation Results
Method
Method
Links
ROUGE-L Score
LLM Judge Score
NPO
Model=Gemma-2B-IT, Var...
2025.10
36
25
UNDIAL
Model=Gemma-2B-IT, Var...
2025.10
34
38
UNDIAL
Model=Gemma-2B-IT, Var...
2025.10
33
39
UNDIAL
Model=Qwen3-8B, Varian...
2025.10
33
39
UNDIAL
Model=Qwen3-8B, Varian...
2025.10
32
38
NPO
Model=Gemma-2B-IT, Var...
2025.10
31
19
NPO
Model=Qwen3-8B, Varian...
2025.10
29
20
NPO
Model=Qwen3-8B, Varian...
2025.10
27
14
RMU
Model=Gemma-2B-IT, Var...
2025.10
13
1
RMU
Model=Qwen3-8B, Varian...
2025.10
13
1
RMU
Model=Qwen3-8B, Varian...
2025.10
10
0
RMU
Model=Gemma-2B-IT, Var...
2025.10
4
0
Feedback
Search any
task
Search any
task