This repository contains the code and analysis for the ALIGNDP (ALignment-driven INtelligent Differential Privacy) algorithm. ALIGNDP is a novel server-side differential privacy (DP) mechanism designed to improve the privacy-utility trade-off in systems that learn from user feedback.
Unlike existing Local Differential Privacy (LDP) methods (e.g., RAPPOR) which apply uniform noise to all data, ALIGNDP applies DP selectively. It identifies and privatizes only "rare" or anomalous feedback events, while utilizing regular feedback directly without distortion. This approach aims to provide a higher-quality signal for model training while consuming less privacy budget.
The central hypothesis of this project is that ALIGNDP can achieve a superior privacy-utility trade-off by spending the privacy budget (
To validate our hypothesis, we designed an experiment to compare ALIGNDP against a uniform LDP baseline. We used a CSV file to simulate real-world, imbalanced feedback data and ran the comparison over 100 simulations to provide a statistically robust analysis.
- Data Source:
align_dp_feedback_data.csv, a local file containing user feedback categories. - Algorithms:
- Uniform LDP (RAPPOR-like): Applies randomized response uniformly to all feedback types.
- ALIGNDP (Selective DP): Applies Laplace noise only to feedback configured as "rare". In this experiment,
"override"was designated as the rare event type.
- Metrics: We measured the average relative error for each feedback category and the total dataset over the 100 runs.
- Dependencies: Ensure you have the necessary libraries installed.
pip install numpy
- Data File: Create a CSV file named
align_dp_feedback_data.csvin your project's root directory. It should have a single column titledFeedbackcontaining the categories you want to test (e.g.,"dislike","like","override"). See Example file in repo. You can even just use this. - Code: Use the Python script that contains the simulation logic for both algorithms.
- Execute: Run the script from your terminal.
python aligndp_csv_validation.py
The average results over 100 runs conclusively demonstrate ALIGNDP's superior performance. The table below compares the average relative error of each algorithm.
| Feedback Category | Uniform LDP (Avg. Error) | ALIGNDP (Avg. Error) |
|---|---|---|
| dislike | 0.1352 | 0.0000 |
| like | 0.1069 | 0.0000 |
| override (Rare) | 0.0946 | 0.2606 |
| Total Dataset | 0.0000* | 0.0912 |
* The total error for the uniform LDP algorithm averages out to near zero over many runs, as the added noise tends to cancel out in aggregate. However, the individual category counts remain consistently distorted.
The results provide robust, empirical evidence that ALIGNDP is a superior approach for privacy-preserving feedback systems.
- Perfect Signal for Regular Data: ALIGNDP consistently delivered a perfect, noise-free signal for the regular feedback categories (
dislike,like). - Targeted Privacy: It successfully applied a controlled amount of noise to the designated rare event (
override), proving that the privacy mechanism is active and effective only where needed. - Optimal Utility: By preserving the integrity of the most frequent data, ALIGNDP achieves a significantly lower overall data error, leading to a higher-fidelity training signal and a more efficient use of the privacy budget compared to traditional uniform LDP methods.
While ALIGNDP is a server-side privacy mechanism, its use of a mathematically-proven differential privacy technique for anonymization provides a strong technical basis for arguing GDPR compliance. This approach achieves irreversible anonymization of sensitive data, which is a key goal of data privacy regulations.