Skip to content

ahuangfeng/service-anomaly-detection-system

Repository files navigation

Service Anomaly Detection Engine for monitoring BGP/MPLS L3 VPN and Internet connectivity services

Service Anomaly Detection engine tailored for service provider networks, with anomaly detection strategies for Layer 3 BGP/MPLS VPNs and Internet connectivity services.

Research and Development funded by Swisscom and developed at INSA Lyon. Anomaly Detection Strategies currently deployed at Swisscom Production Network, monitoring +13,000 L3 VPN customers. An extensive analysis can be found in the papers listed below.

Associated Papers:

  • Alex Huang Feng, Pierre Francois, Maxence Younsi, Stéphane Frenot, Thomas Graf, Wanting Du, Paolo Lucente and Ahmed Elhassani. 2025. “Detecting Service Disruptions in Large BGP/MPLS VPN Networks”. In Proceedings of IEEE Transactions on Network and Service Management (TNSM) Special Issue “Resilient Communication Networks for an Hyper-Connected World”. doi: 10.1109/TNSM.2025.3588314
  • Alex Huang Feng, Pierre Francois, Stéphane Frenot, Thomas Graf, Wanting Du, and Paolo Lucente. 2023. “Daisy: Practical Anomaly Detection in large BGP/MPLS and BGP/SRv6 VPN Networks”. In Proceedings of the Applied Networking Research Workshop (ANRW’23). Association for Computing Machinery, New York, NY, USA, 8-14. doi: 10.1145/3606464.3606470
  • Alex Huang Feng, Pierre Francois, Kensuke Fukuda, Wanting Du, Thomas Graf, Paolo Lucente and Stéphane Frenot. 2024. “Practical Anomaly Detection in Internet Services: An ISP centric approach”. In Proceedings of IEEE/IFIP INTERNATIONAL Workshop on Analytics for Network and Service Management (AnNet’24). NOMS 2024 IEEE/IFIP Network Operations and Management Symposium, Seoul, Korea, 2024. doi: 10.1109/NOMS59830.2024.10575071

IETF Standards:

Requirements

  • Python 3.9.7

Install

This project relies on Python3 (tested on Python 3.9.7)

$ pip3 install -r src/requirements.txt
$ cp config.example.vpn.json config.json
$ cp .env.example .env
$ # Modify .env and config.json with your own values

Configuration example

Notes:

  • data_delay: Sometimes data is cached before being really present in database. If real time is used, data_delay allows to request only the already present data (now() - <data_delay> seconds).

Parameters table

parameter name definition data type range
udp/tcp_sum_percent_threshold Threshold of the difference of the sum of 2 UDP/TCP traffic in percentage to be considered a warning. float 0.0~1.0
udp_pattern_percent_threshold Threshold of the difference of current&historic gradient over the historic traffic at a point. float 0.0~1.0
udp/tcp_sum_weight Weight of concern score based on the UDP/TCP traffic sum comparison float 0.0~1.0
udp_pattern_weight Weight of concern score based on the UDP curve deviation level float 0.0~1.0
tcp_spike_weight Weight of concern score based on the TCP curve spike level float 0.0~1.0
sensitivity The coefficient in a non-linear function that projects concern into a range of 0-1. The larger the coefficient is, the more sensitive it is to the initial value in the non-linear function float 0.0~+inf
time_delta_hours Time difference in hours between the current traffic and the historic traffic to compare int 0~+inf
pattern_length_minutes Duration in minutes of the traffic pattern to be compared to its corresponding historic pattern int 0~+inf
volume_length_minutes Duration in minutes of the traffic sum to be compared to its corresponding historic sum int 0~pattern_length_minutes
time_step_seconds Aggregator of traffic to process the raw traffic, each step aggregates all traffic during this step into one time slot int 60~pattern_length_minutes*60
flatten_step Number of the steps(time slots) to flatten the aggregated curve with flatten step; the longer the step number is, the smoother the processed curve is. int 1~pattern_length_minutes/time_step_seconds/2-1
spike_percent_threshold Relative threshold of the prominence to be detected as a spike float 0.0~1.0
spike_constant_threshold Constant threshold of the prominence of the peak to be detected float 0.0~+inf
drop_percent_threshold Threshold of the difference between the average traffic before and after the spike over the average traffic before. If traffic after spike is lower than (1-Threshold)*mean(traffic_before_spike), this is considered a traffic drop after spike float 0.0~1.0
width Maximum semi-width (number of time slots) of a peak to be considered int 1~pattern_length_minutes/time_step_seconds/2-1
damp_coef Damping ratio of old events counters per time slot float 0.0~1.0
previous slots Number of last N previous time slots where the BMP events numbers are considered for accumulated events number calculation int 0~+inf
oc_monitor_minutes Duration in minutes to monitor the OC interfaces changes int 2~+inf
dropped_bytes_threshold Constant threshold of the dropped bytes int 0~+inf
drop_model Dropped bytes ML model file path, if it's given and valid dropped bytes check is ML-based string -

Running migrations

Before running the project, the postgresql schema should be in place. Run the scripts in scripts/database-migrations

Running

$ # After having modified config.json and .env files
$ cd src
$ python3 main.py

or

$ python3 main.py ../config.json ../.env

Contributors

I like to thank Pierre Francois for the brainstorming sessions that led to the development of this work. Additionally, I like to thank Wanting Du and Severin Dellsperger for their contributions to this project.

About

Service Anomaly Detection System architecture for Layer 3 VPN and Internet connectivity environments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages