Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Direct Question Answering on PISTOL (A_C)
Loading...
77.3
ROUGE-L
NPO
29.668
42.034
54.4
66.766
Oct 20, 2025
ROUGE-L
LLM Judge Score
Updated 6d ago
Evaluation Results
Method
Method
Links
ROUGE-L
LLM Judge Score
NPO
Variant=Context-aware
2025.10
77.3
0
NPO
Variant=Vanilla
2025.10
76.5
0
RMU
Variant=Context-aware
2025.10
57.1
0
UNDIAL
Variant=Vanilla
2025.10
48.5
0
UNDIAL
Variant=Context-aware
2025.10
47.2
0
RMU
Variant=Vanilla
2025.10
31.5
0
Feedback
Search any
task
Search any
task