Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LIAR

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning Quality Correlation AnalysisLIAR
Somers' D0.2769
45
Fact CheckingLIAR
Accuracy@169
33
Veracity PredictionLIAR RAW
Macro F150.59
32
Fact VerificationLIAR
F1 Score68.6
24
Multi-ClassificationLIAR Open
Accuracy46.81
23
Binary ClassificationLIAR Closed
Accuracy79.15
23
Multi-ClassificationLIAR Closed
Accuracy26.99
22
Binary ClassificationLIAR Open
Accuracy84.21
22
Fake News DetectionLIAR (test)
Accuracy65.2
21
Fact-checkingLIAR-RAW
Precision77.38
20
Fake News DetectionLIAR (val)
Accuracy27.7
13
Fact-checkingLIAR
Accuracy79
12
Claim VerificationLIAR (test)
Precision46.8
12
Veracity Explanation RankingLIAR RAW
Informativeness (MAR)2.09
12
Veracity PredictionLIAR-RAW (test)
Precision43.83
12
ReasoningLIAR Ambiguity-Augmented subset of 200 samples
Accuracy@169
11
Fake News DetectionLIAR ambiguity-augmented
Accuracy68.9
11
Fact-CheckingLIAR (test)
Accuracy68.2
11
Explanation GenerationLIAR-RAW (test)
ROU-125.5
11
Node classificationLIAR (test)
Fidelity100
8
Adversarial AttackLIAR-NEW Perplexity
Attack Success Rate (ASR)19.95
7
Adversarial AttackLIAR ClaimBuster NEW
Attack Success Rate (ASR)97.02
7
Adversarial AttackLIAR-NEW Verifact
Attack Success Rate40.34
7
Adversarial AttackLIAR ICL NEW
Attack Success Rate30.35
7
Explanation Quality EvaluationLIAR RAW
Meaningfulness Score2.29
7
Showing 25 of 33 rows