| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Nemotron F1 (test) | Kendall Tau0.707 | 2 | 1mo ago | ||
| FLORES COMET (test) | Adaptive Multi-Model Ranking | Kendall Tau0.677 | 2 | 1mo ago | |
| FLORES BLEU (test) | Adaptive Multi-Model Ranking | Kendall Tau (τ)0.803 | 2 | 1mo ago | |
| TruthfulQA BERTScore (test) | Adaptive Multi-Model Ranking | Kendall's Tau0.45 | 2 | 1mo ago | |
| TruthfulQA LLM-Judge (test) | Adaptive Multi-Model Ranking | Kendall's Tau0.49 | 2 | 1mo ago | |
| GovReport ROUGE-L (test) | Kendall Tau (τ)0.823 | 2 | 1mo ago | ||
| BioLaySumm FKGL (test) | Adaptive Multi-Model Ranking | Kendall Tau (τ)0.8 | 2 | 1mo ago | |
| BioLaySumm BERTScore (test) | Adaptive Multi-Model Ranking | Kendall's Tau0.903 | 2 | 1mo ago | |
| BioLaySumm ROUGE-L (test) | Adaptive Multi-Model Ranking | Kendall Tau (τ)0.957 | 2 | 1mo ago |