Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BERT

Benchmarks

Task NameDataset NameSOTA ResultTrend
Weak scalingBERT-Base (train)
Memory (MB)3,707.01
15
Masked Language ModellingBERT (val)
Accuracy65.54
14
Inference LatencyBERT-base
Attention Layer Latency (s)40.54
6
Masked Language ModelingBERT large
vNMSE0.0022
6
Device PlacementBERT
Latency per G0.0027
6
Language Model Pre-trainingBERT-Large NVIDIA V100 (train)
Max Batch Size96
6
Language Model Pre-trainingBERT-Large NVIDIA 2080 Ti (train)
Max Batch Size50
6
InferenceBERT base
Speedup21.5
5
Secure Transformer InferenceBERT-base
Online Overhead (GB)2.2
4
Privacy-Preserving InferenceBERT Large (inference)
GeLU Time (s)0.351
4
Privacy-Preserving InferenceBERT Base (inference)
GeLU Time (s)0.351
4
Showing 11 of 11 rows