The repository of paper "Robustness Matters: Pre-Training can Enhance the Performance of Encrypted Traffic Analysis", accepted by TIFS'25.
Pytorch >= 2.3.0
Transformers >= 4.41.1
Zeek >= 6.0.3
Use the Zeek plug-in to parse the pcap file and generate Zeek logs. In detail, ps.log records the packet length sequence.
Pre-training relies on large-scale unlabeled data of packet length sequences. This stage enables effective representation learning of the network traffic information from packet sequences.
CUDA_VISIBLE_DEVICES=0,1,2,3 python BERT_pretrain.py -D pretrain_data_dir -d 768 -nh 12 -l 12 -f 3072 -b batch_size -ev 10 -sv 5 -ep n_epochs -lr 5e-5The pre-trained model is saved in "model/{timestamp}"
We performs supervised fine-tuning (SFT) on a small set of labeled dataset regarding specific downstream tasks.
python classifier_BERT_trainer.py -T timestamp -D dataset_dir -o output -b batch_size -ev eval_per_epoch -sv save_per_epoch -ep n_epochs -lr learning_rate The fine-tuned model is saved in "model-classifier/{timestamp}"
In addition, ML-based baselines are achieved in classifier_ML_trainer.py, including AppScanner, ETC-PS, and FlowLens.
python classifier_ML_trainer.py -T timestamp -D dataset_dir -o outputBesides, DL-based baselines are achieved in classifier_DL_trainer.py, including FS-Net and GraphDApp.
python classifier_DL_trainer.py -T timestamp -D dataset_dir -o output -b batch_size -ev eval_per_epoch -ep n_epochs -lr learning_rate For BERT-ps, we performs robustness test on the test set.
For each sample in test set, we calculate its
python RS_test_BERT.py -D dataset -m model -mt model_type -st shuffle_type -r rate -n0 n0 -n n -b batch_size -o output -N n_sampleAfter this process, we can calculate the
In addition, the robustness test of ML-based baselines and DL-based baselines are achieved by RS_test_ML.py and RS_test_DL.py, respectively.
python RS_test_ML.py -D dataset -m model -mt model_type -st shuffle_type -r rate -n0 n0 -n n -b batch_size -o output -N n_sample
python RS_test_DL.py -D dataset -m model -mt model_type -st shuffle_type -r rate -n0 n0 -n n -b batch_size -o output -N n_sampleFor any work related to the analysis of encrypted video traffic, welcome to please cite our paper as:
@article{Yang2025Robust,
title={Robustness Matters: Pre-Training Can Enhance the Performance of Encrypted Traffic Analysis},
author={Yang, Luming and Liu, Lin and Huang, Jun-Jie and Shi, Jiangyong and Fu, Shaojing and Wang, Yongjun and Su, Jinshu},
journal={IEEE Transactions on Information Forensics and Security (TIFS)},
year = {2025},
volume={20},
pages = {10588-10603},
doi={10.1109/TIFS.2025.3613970},
}