| WMT Benchmarks Average WMT'14 & WMT'19 (aggregation) | LLM-PP | MAE0.29 | | 16 | 4d ago |
| WMT En-De 2019 (val) | LLM-PP | MAE0.29 | | 16 | 4d ago |
| WMT En-Fr 2014 (val) | LLM-PP | MAE0.28 | | 16 | 4d ago |
| WMT En-De 2014 (val) | LLM-Distill-PP | MAE0.22 | | 16 | 4d ago |
| Large Model Performance Prediction Dataset 80% masking (test) | STAR | RMSE7.5 | | 10 | 4d ago |
| MNLI source domains (out-of-domain) | Cosine distance (fine-tuned) | ROC AUC0.683 | | 10 | 4d ago |
| MNLI source domains (in-domain) | Cosine distance (fine-tuned) | ROC AUC0.699 | | 10 | 4d ago |
| Sentiment temporal (out-of-domain) | Cosine distance (fine-tuned) | ROC AUC0.834 | | 10 | 4d ago |
| Sentiment temporal (in-domain) | Cosine distance (fine-tuned) | ROC AUC0.852 | | 10 | 4d ago |
| Sentiment categories (out-of-domain) | Cosine distance (fine-tuned) | ROC AUC0.822 | | 10 | 4d ago |
| Sentiment categories (in-domain) | Cosine distance (fine-tuned) | ROC AUC0.845 | | 10 | 4d ago |
| Tatoeba | | MAE5.82 | | 9 | 4d ago |
| MewsliX | MAML | MAE9.33 | | 9 | 4d ago |
| LAREQA | | MAE1.51 | | 9 | 4d ago |
| XQUAD | MDGPR | MAE3.15 | | 9 | 4d ago |
| TyDiQA | | MAE4.29 | | 9 | 4d ago |
| XCOPA | MDGPR | MAE1.96 | | 9 | 4d ago |
| PAWS | | MAE0.8 | | 9 | 4d ago |
| MLQA | Group Lasso | MAE2.21 | | 9 | 4d ago |
| AmericasNLP Spanish to Bribri 2023 (test) | LLM-PP | MAE0.32 | | 4 | 4d ago |
| AmericasNLP Chatino to Spanish 2023 (test) | LLM-PP | MAE1.21 | | 4 | 4d ago |
| AmericasNLP Bribri to Spanish 2023 (test) | LLM-PP | MAE0.16 | | 4 | 4d ago |
| Chatino to Spanish | Neuron-wise MoS | MAE0.01 | | 4 | 4d ago |
| Bribri to Spanish | LLM-PP | MAE0.01 | | 4 | 4d ago |
| BIG-bench (test) | Size | Boolean Expressions1.144 | | 3 | 4d ago |