Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Faithfulness Detection on FEVER n=200
Loading...
1
M4
HHEM-2.1-Open
0.0224
0.2762
0.53
0.7838
Apr 25, 2026
M4
Binary Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
M4
Binary Accuracy
HHEM-2.1-Open
Mode=in-dist on FEVER,...
2026.04
1
86
RAGAS faithfulness
Judge=gpt-5.4, Four-wa...
2026.04
0.63
88
GSAR
Mode=adversarial mode,...
2026.04
0.31
-
GSAR
Mode=adversarial mode,...
2026.04
0.27
-
GSAR
Mode=single mode, Judg...
2026.04
0.24
-
GSAR
Mode=single mode, Judg...
2026.04
0.06
-
Feedback
Search any
task
Search any
task