Skip to content

aws-samples/pseudonymization-service

Building a pseudonymization service on AWS to protect sensitive data

Data is changing the world and this fact makes this resource even more valuable than oil. Given the importance of this new asset law makers are keen to protect the privacy of individuals and prevent any misuse. Organisations often face challenges as they aim to comply with data privacy regulations like Europe's General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). These regulations demand strict access controls to protect sensitive personal data. This project shows a solution which uses a microservice based approach to enable fast and cost-effective pseudonymization of data sets. The solution relies on AES-GCM-SIV algorithm to Psedonymize sensitive data.

Blogpost URL

Part 1: Build a pseudonymization service on AWS to protect sensitive data

Solution Overview

The solution follows a serverless architecture approach. Pseudonymization logic is written in java and leverages the java implementation of AES-GCM-SIV developed by codahale. The source code is hosted in an AWS Lambda lambda function. Secret keys are stored securely on AWS Secrets Manager. AWS Key Management System ensures that secrets and sensitive components are protected at rest. The service is exposed to consumers via Amazon API Gateway as a REST Interface. Consumers are authenticated and authorized to consume the endpoints via API Keys. The solution per se is technology agnostic, it can be addopted by any form of consumer as long as they are able to consume REST Interfaces.

Architecture

Alt text

Resource Deployment

The cloudformation stack will create the following resources:

  • API Gateway REST Interface with 2 resources
  • Lambda Function acting as the API integration
  • Secrets Manager Secret
  • KMS Key
  • IAM Roles & Policies
  • CloudWatch Logs Group

Parameters:

  • STACK_NAME - CloudFormation stack name
  • AWS_REGION - AWS region where the solution will be deployed
  • AWS_PROFILE - Named profile that will apply to the AWS CLI command
  • ARTEFACT_S3_BUCKET - S3 bucket where the infrastructure code will be stored. (The bucket must be created in the same region where the solution lives)

Outputs:

  • PseudonymizationUrl
  • ReidentificationUrl
  • KmsKeyArn
  • SecretName

Deployment Commands

All deployments are done using bash scripts, in this case we use the following commands:

  • ./deployment_scripts/deploy.sh - Packages, builds and deploys the local artifacts that your AWS CloudFormation template (e.g: cfn_template.yaml) is referencing

    ./deployment_scripts/deploy.sh -s STACK_NAME \
    -b ARTEFACT_S3_BUCKET -r AWS_REGION \
    -p AWS_PROFILE
  • ./deployment_scripts/destroy.sh - Destroys the CloudFormation Stack you created in the deployment above (e.g: cfnstackdeployment)

    ./deployment_scripts/destroy.sh -s STACK_NAME \
    -p AWS_PROFILE -r AWS_REGION

Generate and store secrets in Secrets Manager

Run the python script helper_scripts\key_generator.py to generate the encryption keys via KMS and persist them in Secrets Manager.

python ./helper_scripts/key_generator.py \
-k KmsKeyArn -s SecretName -r AWS_REGION \
-p AWS_PROFILE 

Test

You may test the solution via postman. In the test folder you can find the postman collection json file with all the necessary configurations to call the REST endpoints. Once imported make sure to fill the variables in the collection. All values will be outputted from ./deployment_scripts/deploy.sh, except for the API_KEY which you have to fetch from the API Gateway console and the deterministic one which it's up to you to set it to True or False.

Prerequisites

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •