Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Direct Question Answering on PISTOL (A_B)
Loading...
79.1
ROUGE-L
NPO
48.732
56.616
64.5
72.384
Oct 20, 2025
ROUGE-L
LLM Judge Score
Updated 6d ago
Evaluation Results
Method
Method
Links
ROUGE-L
LLM Judge Score
NPO
Variant=Context-aware
2025.10
79.1
0
NPO
Variant=Vanilla
2025.10
78.8
0
RMU
Variant=Context-aware
2025.10
76.5
10
RMU
Variant=Vanilla
2025.10
70.5
5
UNDIAL
Variant=Context-aware
2025.10
50.2
5
UNDIAL
Variant=Vanilla
2025.10
49.9
10
Feedback
Search any
task
Search any
task