Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Contextual Question Answering on PISTOL (A_B)
Loading...
97.6
ROUGE-L
NPO
66.4
74.5
82.6
90.7
Oct 20, 2025
ROUGE-L
LLM Judge Score
Updated 6d ago
Evaluation Results
Method
Method
Links
ROUGE-L
LLM Judge Score
NPO
Variant=Context-aware
2025.10
97.6
100
RMU
Variant=Context-aware
2025.10
97
100
UNDIAL
Variant=Context-aware
2025.10
96.5
100
NPO
Variant=Vanilla
2025.10
87.8
65
RMU
Variant=Vanilla
2025.10
78.4
25
UNDIAL
Variant=Vanilla
2025.10
67.6
95
Feedback
Search any
task
Search any
task