Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hallucination Detection on TruthfulQA (AUROC and F1-Score)

0.8851AUROC

CausalGaze

0.3181960.4653730.612550.759727Apr 13, 2026Apr 20, 2026Apr 28, 2026May 6, 2026May 13, 2026May 21, 2026May 29, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.04
0.885179.71
2026.04
0.880581.89
2026.04
0.880377.14
2026.04
0.86880.71
2026.04
0.864777.47
2026.04
0.839277.06
2026.04
0.81782.8
2026.04
0.814278.58
2026.04
0.811278.83
2026.05
0.810.78
2026.04
0.799380
2026.04
0.793777.4
2026.05
0.790.76
2026.04
0.78279.03
2026.05
0.7576-
2026.05
0.750.72
2026.05
0.7304-
2026.05
0.7249-
2026.05
0.7192-
2026.05
0.7168-
2026.05
0.711-
2026.05
0.7073-
2026.05
0.7059-
2026.05
0.7036-
2026.05
0.6966-
2026.05
0.6827-
2026.05
0.6801-
2026.04
0.673265.4
2026.05
0.6679-
2026.04
0.66765.94
2026.05
0.6615-
2026.04
0.66167.87
2026.05
0.6601-
2026.05
0.660.69
2026.04
0.655761.36
2026.05
0.6541-
2026.04
0.65166.3
2026.05
0.647-
2026.05
0.6446-
2026.05
0.6394-
2026.04
0.63765.42
2026.04
0.631660.06
2026.05
0.6289-
2026.05
0.6251-
2026.05
0.6233-
2026.04
0.621761.45
2026.04
0.61762.2
2026.04
0.61659.26
2026.05
0.6158-
2026.04
0.615161.87
2026.05
0.6113-
2026.05
0.610.67
2026.04
0.601258.6
2026.05
0.5966-
2026.05
0.5912-
2026.05
0.590.67
2026.05
0.5833-
2026.05
0.5808-
2026.05
0.5796-
2026.04
0.577153.45
2026.04
0.576357.2
2026.05
0.5712-
2026.04
0.567760.34
2026.05
0.564-
2026.04
0.553854.21
2026.05
0.5507-
2026.05
0.550.67
2026.05
0.540.67
2026.05
0.540.67
2026.04
0.53754.89
2026.05
0.5324-
2026.05
0.530.67
2026.05
0.530.67
2026.05
0.530.68
2026.04
0.529553.45
2026.05
0.529-
2026.04
0.52651.25
2026.05
0.5219-
2026.05
0.5203-
2026.05
0.520.67
2026.04
0.519352.86
2026.04
0.518155.56
2026.05
0.5154-
2026.05
0.5084-
2026.05
0.490.67
2026.05
0.470.67
2026.05
0.430.67
2026.05
0.350.67
2026.05
0.340.67
2026.05
0.340.67
2026.05
0.340.67