FlowletFormer: Network Behavioral Semantic Aware Pre-training Model for Traffic Classification
About
Network traffic classification using pre-training models has shown promising results, but existing methods struggle to capture packet structural characteristics, flow-level behaviors, hierarchical protocol semantics, and inter-packet contextual relationships. To address these challenges, we propose FlowletFormer, a BERT-based pre-training model specifically designed for network traffic analysis. FlowletFormer introduces a Coherent Behavior-Aware Traffic Representation Model for segmenting traffic into semantically meaningful units, a Protocol Stack Alignment-Based Embedding Layer to capture multilayer protocol semantics, and Field-Specific and Context-Aware Pretraining Tasks to enhance both inter-packet and inter-flow learning. Experimental results demonstrate that FlowletFormer significantly outperforms existing methods in the effectiveness of traffic representation, classification accuracy, and few-shot learning capability. Moreover, by effectively integrating domain-specific network knowledge, FlowletFormer shows better comprehension of the principles of network transmission (e.g., stateful connections of TCP), providing a more robust and trustworthy framework for traffic analysis.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Encrypted Traffic Classification | ISCX Tor 2016 | Accuracy92.15 | 22 | |
| Encrypted Traffic Classification | CIC-IoT 2022 | Accuracy91.09 | 21 | |
| Encrypted Traffic Classification | CSTNET-TLS | Accuracy (AC)86.05 | 20 | |
| Encrypted Traffic Classification | ISCX-VPN Service | Accuracy94 | 12 | |
| Encrypted Traffic Classification | ISCX-VPN APP | Accuracy84.8 | 12 | |
| Encrypted Traffic Classification | USTC-TFC | Accuracy96.5 | 12 | |
| Encrypted Traffic Classification | ISCXVPN 2016 | Accuracy (AC)94 | 10 | |
| Encrypted Traffic Classification | USTC-TFC 2016 | Accuracy96.5 | 10 |