Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BERT

Benchmarks

Task NameDataset NameSOTA ResultTrend
Weak scalingBERT-Base (train)
Memory (MB)3,707.01
15
Masked Language ModellingBERT (val)
Accuracy65.54
14
End-to-end inference tuningBERT-Large
Tuning Time (s)22.6
9
End-to-end inference tuningBERT Base
Tuning Time (s)23.3
9
Inference LatencyBERT-base
Attention Layer Latency (s)40.54
6
Masked Language ModelingBERT large
vNMSE0.0022
6
Device PlacementBERT
Latency per G0.0027
6
Language Model Pre-trainingBERT-Large NVIDIA V100 (train)
Max Batch Size96
6
Language Model Pre-trainingBERT-Large NVIDIA 2080 Ti (train)
Max Batch Size50
6
InferenceBERT base
Speedup21.5
5
Secure Transformer InferenceBERT-base
Online Overhead (GB)2.2
4
Privacy-Preserving InferenceBERT Large (inference)
GeLU Time (s)0.351
4
Privacy-Preserving InferenceBERT Base (inference)
GeLU Time (s)0.351
4
Recursive circuit gate count analysisBERT base
Nova589,824
1
Masked Language ModelingBERT MLM small (val)
Validation Loss6.9412
1
Showing 15 of 15 rows