| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AlignBench | GPT-4 | Agreement74.69 | 18 | 4d ago | |
| DeepfakeJudge Meta-Human | Pairwise Accuracy99.4 | 12 | 4d ago | ||
| DeepfakeJudge Meta | DeepfakeJudge-7B | Pairwise Accuracy96.2 | 12 | 4d ago | |
| LLMEval | GPT-4 | Agreement0.5098 | 10 | 4d ago | |
| AUTO-J Eval-P | GPT-4 | Agreement62.28 | 10 | 4d ago | |
| SummEval (anchor set) | GPT-4o | Accuracy94.5 | 6 | 4d ago |