Service Anomaly Detection engine tailored for service provider networks, with anomaly detection strategies for Layer 3 BGP/MPLS VPNs and Internet connectivity services.
Research and Development funded by Swisscom and developed at INSA Lyon. Anomaly Detection Strategies currently deployed at Swisscom Production Network, monitoring +13,000 L3 VPN customers. An extensive analysis can be found in the papers listed below.
Associated Papers:
- Alex Huang Feng, Pierre Francois, Maxence Younsi, Stéphane Frenot, Thomas Graf, Wanting Du, Paolo Lucente and Ahmed Elhassani. 2025. “Detecting Service Disruptions in Large BGP/MPLS VPN Networks”. In Proceedings of IEEE Transactions on Network and Service Management (TNSM) Special Issue “Resilient Communication Networks for an Hyper-Connected World”. doi: 10.1109/TNSM.2025.3588314
- Alex Huang Feng, Pierre Francois, Stéphane Frenot, Thomas Graf, Wanting Du, and Paolo Lucente. 2023. “Daisy: Practical Anomaly Detection in large BGP/MPLS and BGP/SRv6 VPN Networks”. In Proceedings of the Applied Networking Research Workshop (ANRW’23). Association for Computing Machinery, New York, NY, USA, 8-14. doi: 10.1145/3606464.3606470
- Alex Huang Feng, Pierre Francois, Kensuke Fukuda, Wanting Du, Thomas Graf, Paolo Lucente and Stéphane Frenot. 2024. “Practical Anomaly Detection in Internet Services: An ISP centric approach”. In Proceedings of IEEE/IFIP INTERNATIONAL Workshop on Analytics for Network and Service Management (AnNet’24). NOMS 2024 IEEE/IFIP Network Operations and Management Symposium, Seoul, Korea, 2024. doi: 10.1109/NOMS59830.2024.10575071
IETF Standards:
- Thomas Graf, Wanting Du, Pierre Francois, and Alex Huang Feng. “A Framework for a Network Anomaly Detection Architecture”. Internet Engineering Task Force, Internet-Draft draft-ietf-nmop-network-anomaly-architecture, May 2025, Work in Progress. https://datatracker.ietf.org/doc/draft-ietf-nmop-network-anomaly-architecture/
- Python 3.9.7
This project relies on Python3 (tested on Python 3.9.7)
$ pip3 install -r src/requirements.txt
$ cp config.example.vpn.json config.json
$ cp .env.example .env
$ # Modify .env and config.json with your own values- Check config.example.vpn.json and .env.example for examples.
- Example for Link saturation strategy config.inet.lab.linksaturation.json
- Example for VPN Disruption Detection strategy config.example.vpn.json
Notes:
data_delay: Sometimes data is cached before being really present in database. If real time is used,data_delayallows to request only the already present data (now() - <data_delay> seconds).
| parameter name | definition | data type | range |
|---|---|---|---|
| udp/tcp_sum_percent_threshold | Threshold of the difference of the sum of 2 UDP/TCP traffic in percentage to be considered a warning. | float | 0.0~1.0 |
| udp_pattern_percent_threshold | Threshold of the difference of current&historic gradient over the historic traffic at a point. | float | 0.0~1.0 |
| udp/tcp_sum_weight | Weight of concern score based on the UDP/TCP traffic sum comparison | float | 0.0~1.0 |
| udp_pattern_weight | Weight of concern score based on the UDP curve deviation level | float | 0.0~1.0 |
| tcp_spike_weight | Weight of concern score based on the TCP curve spike level | float | 0.0~1.0 |
| sensitivity | The coefficient in a non-linear function that projects concern into a range of 0-1. The larger the coefficient is, the more sensitive it is to the initial value in the non-linear function | float | 0.0~+inf |
| time_delta_hours | Time difference in hours between the current traffic and the historic traffic to compare | int | 0~+inf |
| pattern_length_minutes | Duration in minutes of the traffic pattern to be compared to its corresponding historic pattern | int | 0~+inf |
| volume_length_minutes | Duration in minutes of the traffic sum to be compared to its corresponding historic sum | int | 0~pattern_length_minutes |
| time_step_seconds | Aggregator of traffic to process the raw traffic, each step aggregates all traffic during this step into one time slot | int | 60~pattern_length_minutes*60 |
| flatten_step | Number of the steps(time slots) to flatten the aggregated curve with flatten step; the longer the step number is, the smoother the processed curve is. | int | 1~pattern_length_minutes/time_step_seconds/2-1 |
| spike_percent_threshold | Relative threshold of the prominence to be detected as a spike | float | 0.0~1.0 |
| spike_constant_threshold | Constant threshold of the prominence of the peak to be detected | float | 0.0~+inf |
| drop_percent_threshold | Threshold of the difference between the average traffic before and after the spike over the average traffic before. If traffic after spike is lower than (1-Threshold)*mean(traffic_before_spike), this is considered a traffic drop after spike | float | 0.0~1.0 |
| width | Maximum semi-width (number of time slots) of a peak to be considered | int | 1~pattern_length_minutes/time_step_seconds/2-1 |
| damp_coef | Damping ratio of old events counters per time slot | float | 0.0~1.0 |
| previous slots | Number of last N previous time slots where the BMP events numbers are considered for accumulated events number calculation | int | 0~+inf |
| oc_monitor_minutes | Duration in minutes to monitor the OC interfaces changes | int | 2~+inf |
| dropped_bytes_threshold | Constant threshold of the dropped bytes | int | 0~+inf |
| drop_model | Dropped bytes ML model file path, if it's given and valid dropped bytes check is ML-based | string | - |
Before running the project, the postgresql schema should be in place. Run the scripts in scripts/database-migrations
$ # After having modified config.json and .env files
$ cd src
$ python3 main.py
or
$ python3 main.py ../config.json ../.envI like to thank Pierre Francois for the brainstorming sessions that led to the development of this work. Additionally, I like to thank Wanting Du and Severin Dellsperger for their contributions to this project.