| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Weak scaling | BERT-Base (train) | Memory (MB)3,707.01 | 15 | |
| Masked Language Modelling | BERT (val) | Accuracy65.54 | 14 | |
| Masked Language Modeling | BERT large | vNMSE0.0022 | 6 | |
| Device Placement | BERT | Latency per G0.0027 | 6 | |
| Language Model Pre-training | BERT-Large NVIDIA V100 (train) | Max Batch Size96 | 6 | |
| Language Model Pre-training | BERT-Large NVIDIA 2080 Ti (train) | Max Batch Size50 | 6 | |
| Privacy-Preserving Inference | BERT Large (inference) | GeLU Time (s)0.351 | 4 | |
| Privacy-Preserving Inference | BERT Base (inference) | GeLU Time (s)0.351 | 4 |