| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reasoning Evaluation | DeepfakeJudge Reason 1.0 (test) | BLEU-19 | 16 | |
| Deepfake Detection | DeepfakeJudge-Detect (test) | Accuracy (Real)96.6 | 15 | |
| Pointwise Reasoning Evaluation | DeepfakeJudge Meta-Human | RMSE0.5 | 12 | |
| Pointwise Reasoning Evaluation | DeepfakeJudge Meta | RMSE0.61 | 12 | |
| Pairwise Comparison | DeepfakeJudge Meta-Human | Pairwise Accuracy99.4 | 12 | |
| Pairwise Comparison | DeepfakeJudge Meta | Pairwise Accuracy96.2 | 12 |