Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Contextual Question Answering on 5% (forget set)
Loading...
91
ROUGE-L
RMU
-2.6
21.7
46
70.3
Oct 20, 2025
ROUGE-L
LLM Judge Score
Updated 6d ago
Evaluation Results
Method
Method
Links
ROUGE-L
LLM Judge Score
RMU
Model=Gemma-2B-IT, Var...
2025.10
91
99
NPO
Model=Gemma-2B-IT, Var...
2025.10
87
98
UNDIAL
Model=Gemma-2B-IT, Var...
2025.10
87
98
UNDIAL
Model=Qwen3-8B, Varian...
2025.10
68
98
RMU
Model=Qwen3-8B, Varian...
2025.10
67
97
NPO
Model=Qwen3-8B, Varian...
2025.10
63
95
UNDIAL
Model=Qwen3-8B, Varian...
2025.10
59
97
NPO
Model=Gemma-2B-IT, Var...
2025.10
55
81
UNDIAL
Model=Gemma-2B-IT, Var...
2025.10
53
82
NPO
Model=Qwen3-8B, Varian...
2025.10
46
84
RMU
Model=Qwen3-8B, Varian...
2025.10
18
5
RMU
Model=Gemma-2B-IT, Var...
2025.10
1
0
Feedback
Search any
task
Search any
task