Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Data Contamination Detection on SAT
Loading...
79
F1 Score
Entropy-Noise
-3.16
18.17
39.5
60.83
Oct 10, 2025
F1 Score
AUC
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score
AUC
Entropy-Noise
Target Model=Qwen2.5-7...
2025.10
79
77
Self-Critique
Target Model=Qwen2.5-7...
2025.10
69
67
Min-K%
Target Model=DeepSeek-...
2025.10
69
35
Recall
Target Model=DeepSeek-...
2025.10
69
62
PPL
Target Model=DeepSeek-...
2025.10
68
64
Entropy-Noise
Target Model=DeepSeek-...
2025.10
67
45
Entropy-Temp
Target Model=Qwen2.5-7...
2025.10
66
69
CDD
Target Model=DeepSeek-...
2025.10
66
50
Self-Critique
Target Model=DeepSeek-...
2025.10
66
67
Entropy-Temp
Target Model=DeepSeek-...
2025.10
64
61
Recall
Target Model=Qwen2.5-7...
2025.10
62
62
Min-K%++
Target Model=DeepSeek-...
2025.10
62
49
CDD
Target Model=Qwen2.5-7...
2025.10
57
47
PPL
Target Model=Qwen2.5-7...
2025.10
54
50
Min-K%
Target Model=Qwen2.5-7...
2025.10
32
50
Min-K%++
Target Model=Qwen2.5-7...
2025.10
0
31
Feedback
Search any
task
Search any
task