Skip to content

krimler/alignDP

Repository files navigation

ALIGNDP: A Selective Differential Privacy Algorithm for Feedback Learning

Project Overview

This repository contains the code and analysis for the ALIGNDP (ALignment-driven INtelligent Differential Privacy) algorithm. ALIGNDP is a novel server-side differential privacy (DP) mechanism designed to improve the privacy-utility trade-off in systems that learn from user feedback.

Unlike existing Local Differential Privacy (LDP) methods (e.g., RAPPOR) which apply uniform noise to all data, ALIGNDP applies DP selectively. It identifies and privatizes only "rare" or anomalous feedback events, while utilizing regular feedback directly without distortion. This approach aims to provide a higher-quality signal for model training while consuming less privacy budget.

Core Hypothesis

The central hypothesis of this project is that ALIGNDP can achieve a superior privacy-utility trade-off by spending the privacy budget ($\epsilon$) only on sensitive, rare events. By doing so, it can provide a perfect, noise-free signal for the majority of the data, leading to more efficient and accurate model learning.

Methodology

Experiment Design

To validate our hypothesis, we designed an experiment to compare ALIGNDP against a uniform LDP baseline. We used a CSV file to simulate real-world, imbalanced feedback data and ran the comparison over 100 simulations to provide a statistically robust analysis.

  • Data Source: align_dp_feedback_data.csv, a local file containing user feedback categories.
  • Algorithms:
    • Uniform LDP (RAPPOR-like): Applies randomized response uniformly to all feedback types.
    • ALIGNDP (Selective DP): Applies Laplace noise only to feedback configured as "rare". In this experiment, "override" was designated as the rare event type.
  • Metrics: We measured the average relative error for each feedback category and the total dataset over the 100 runs.

How to Run

  1. Dependencies: Ensure you have the necessary libraries installed.
    pip install numpy
  2. Data File: Create a CSV file named align_dp_feedback_data.csv in your project's root directory. It should have a single column titled Feedback containing the categories you want to test (e.g., "dislike", "like", "override"). See Example file in repo. You can even just use this.
  3. Code: Use the Python script that contains the simulation logic for both algorithms.
  4. Execute: Run the script from your terminal.
    python aligndp_csv_validation.py

Key Findings (Final Results)

The average results over 100 runs conclusively demonstrate ALIGNDP's superior performance. The table below compares the average relative error of each algorithm.

Feedback Category Uniform LDP (Avg. Error) ALIGNDP (Avg. Error)
dislike 0.1352 0.0000
like 0.1069 0.0000
override (Rare) 0.0946 0.2606
Total Dataset 0.0000* 0.0912

* The total error for the uniform LDP algorithm averages out to near zero over many runs, as the added noise tends to cancel out in aggregate. However, the individual category counts remain consistently distorted.

Conclusion

The results provide robust, empirical evidence that ALIGNDP is a superior approach for privacy-preserving feedback systems.

  • Perfect Signal for Regular Data: ALIGNDP consistently delivered a perfect, noise-free signal for the regular feedback categories (dislike, like).
  • Targeted Privacy: It successfully applied a controlled amount of noise to the designated rare event (override), proving that the privacy mechanism is active and effective only where needed.
  • Optimal Utility: By preserving the integrity of the most frequent data, ALIGNDP achieves a significantly lower overall data error, leading to a higher-fidelity training signal and a more efficient use of the privacy budget compared to traditional uniform LDP methods.

Legal and Compliance Note

While ALIGNDP is a server-side privacy mechanism, its use of a mathematically-proven differential privacy technique for anonymization provides a strong technical basis for arguing GDPR compliance. This approach achieves irreversible anonymization of sensitive data, which is a key goal of data privacy regulations.

About

Selective Differential Privacy Algorithm for Feedback Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages