Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Explanation Quality Evaluation on LIAR RAW
Loading...
2.29
Meaningfulness Score
ChatGPT w/ evi
1.7492
1.8896
2.03
2.1704
Nov 25, 2025
Meaningfulness Score
Informativeness Score
Soundness Score
Relevance Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Meaningfulness Score
Informativeness Score
Soundness Score
Relevance Score
ChatGPT w/ evi
Backbone=ChatGPT, Evid...
2025.11
2.29
3.71
4.04
3.99
ChatGPT w/o evi
Backbone=ChatGPT, Evid...
2025.11
2.27
3.93
4.29
4.5
L-Defense
Backbone=LLaMA2
2025.11
2.2
4.39
4.64
4.63
L-Defense
Backbone=ChatGPT
2025.11
2.06
4.12
4.28
4.47
SFT
Backbone=LLaMA2
2025.11
1.9
4.48
4.6
4.65
Oracle - skyline
Backbone=ChatGPT, Evid...
2025.11
1.85
4.44
4.6
4.69
S-EGS
Backbone=LLaMA2
2025.11
1.77
4.58
4.66
4.83
Feedback
Search any
task
Search any
task