BERT-ps

The repository of paper "Robustness Matters: Pre-Training can Enhance the Performance of Encrypted Traffic Analysis", accepted by TIFS'25.

requirements

Pytorch >= 2.3.0

Transformers >= 4.41.1

Zeek >= 6.0.3

Pre-processing

Use the Zeek plug-in to parse the pcap file and generate Zeek logs. In detail, ps.log records the packet length sequence.

Pre-training

Pre-training relies on large-scale unlabeled data of packet length sequences. This stage enables effective representation learning of the network traffic information from packet sequences.

CUDA_VISIBLE_DEVICES=0,1,2,3 python BERT_pretrain.py -D pretrain_data_dir -d 768 -nh 12 -l 12 -f 3072 -b batch_size -ev 10 -sv 5 -ep n_epochs -lr 5e-5

The pre-trained model is saved in "model/{timestamp}"

Fine-tuning

We performs supervised fine-tuning (SFT) on a small set of labeled dataset regarding specific downstream tasks.

python classifier_BERT_trainer.py -T timestamp -D dataset_dir -o output -b batch_size -ev eval_per_epoch -sv save_per_epoch -ep n_epochs -lr learning_rate

The fine-tuned model is saved in "model-classifier/{timestamp}"

In addition, ML-based baselines are achieved in classifier_ML_trainer.py, including AppScanner, ETC-PS, and FlowLens.

python classifier_ML_trainer.py -T timestamp -D dataset_dir -o output

Besides, DL-based baselines are achieved in classifier_DL_trainer.py, including FS-Net and GraphDApp.

python classifier_DL_trainer.py -T timestamp -D dataset_dir -o output -b batch_size -ev eval_per_epoch -ep n_epochs -lr learning_rate

Robustness analysis

For BERT-ps, we performs robustness test on the test set. For each sample in test set, we calculate its $$p_A$$ and $$p_B$$ using Monte Carlo method. There are 3 type of network shuffle, including packet loss, retransmission, and disorder with different shuffle rate.

python RS_test_BERT.py -D dataset -m model -mt model_type -st shuffle_type -r rate -n0 n0 -n n -b batch_size -o output -N n_sample

After this process, we can calculate the $$\Delta\hat{p} = \underline{p_A}-\overline{p_B}$$ for each sample and then generate the PA-curve and compute the PA-aera.

In addition, the robustness test of ML-based baselines and DL-based baselines are achieved by RS_test_ML.py and RS_test_DL.py, respectively.

python RS_test_ML.py -D dataset -m model -mt model_type -st shuffle_type -r rate -n0 n0 -n n -b batch_size -o output -N n_sample
python RS_test_DL.py -D dataset -m model -mt model_type -st shuffle_type -r rate -n0 n0 -n n -b batch_size -o output -N n_sample

Citation

For any work related to the analysis of encrypted video traffic, welcome to please cite our paper as:

@article{Yang2025Robust,
  title={Robustness Matters: Pre-Training Can Enhance the Performance of Encrypted Traffic Analysis}, 
  author={Yang, Luming and Liu, Lin and Huang, Jun-Jie and Shi, Jiangyong and Fu, Shaojing and Wang, Yongjun and Su, Jinshu},
  journal={IEEE Transactions on Information Forensics and Security (TIFS)}, 
  year = {2025},
  volume={20},
  pages = {10588-10603},
  doi={10.1109/TIFS.2025.3613970},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT-ps

requirements

Pre-processing

Pre-training

Fine-tuning

Robustness analysis

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ps_tokenizer		ps_tokenizer
zeek_plugins		zeek_plugins
BERT_pretrain.py		BERT_pretrain.py
CertRobustness.py		CertRobustness.py
README.md		README.md
RS_test_BERT.py		RS_test_BERT.py
RS_test_DL.py		RS_test_DL.py
RS_test_ML.py		RS_test_ML.py
classifier_BERT_trainer.py		classifier_BERT_trainer.py
classifier_DL_trainer.py		classifier_DL_trainer.py
classifier_ML_trainer.py		classifier_ML_trainer.py

Folders and files

Latest commit

History

Repository files navigation

BERT-ps

requirements

Pre-processing

Pre-training

Fine-tuning

Robustness analysis

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages