|
| 1 | +- Start Date: 2022-10-27 |
| 2 | +- RFC Type: feature |
| 3 | +- RFC PR: <link> |
| 4 | +- RFC Status: draft |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +**Table of Contents** |
| 9 | + |
| 10 | +- [Summary](#summary) |
| 11 | +- [Motivation](#motivation) |
| 12 | + - [Why?](#why-) |
| 13 | + - [Expected Outcome](#expected-outcome) |
| 14 | + - [User Experience](#user-experience) |
| 15 | + - [Token Format](#token-format) |
| 16 | + - [Backend Token Hygiene](#backend-token-hygiene) |
| 17 | + - [Secret Verification Service](#secret-verification-service) |
| 18 | +- [Background](#background) |
| 19 | +- [Options Considered](#options-considered) |
| 20 | + - [Option #1](#option--1) |
| 21 | + - [`n+1`](#-n-1-) |
| 22 | + - [`n+2`](#-n-2-) |
| 23 | + - [Option #2](#option--2) |
| 24 | + - [Option #3](#option--3) |
| 25 | +- [Drawbacks](#drawbacks) |
| 26 | + - [Option #2](#option--2-1) |
| 27 | + - [Option #3](#option--3-1) |
| 28 | +- [Unresolved questions](#unresolved-questions) |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | +# Summary |
| 33 | + |
| 34 | +Improve on Sentry's API token implementation to a more secure pattern. We will have three major goals: |
| 35 | + |
| 36 | +1. Using hashed values in the database |
| 37 | +2. Only display the token once to the end user upon creation |
| 38 | +3. Allow users to _name_ the tokens ([#9600](https://github.com/getsentry/sentry/issues/9600)) |
| 39 | +4. Use a predictable prefix to integrate with various secret scanning services (ex. Github's Secret Scanning) |
| 40 | + |
| 41 | +# Motivation |
| 42 | + |
| 43 | +## Why? |
| 44 | + |
| 45 | +Sentry currently provides several strong options to secure a user's account, including SSO, SAML, and 2FA options. However our approach to handling API tokens is not as mature. We have two major issues with the current API token implementation: |
| 46 | + |
| 47 | +1. Tokens are stored as plaintext in the database, increasing the risk of exposure |
| 48 | +2. Tokens are visible as plaintext in the Sentry UI |
| 49 | + |
| 50 | +As a result, Sentry has to take extra steps to ensure the confidentially of API tokens (for example, [PR #39254](https://github.com/getsentry/sentry/pull/39254)). |
| 51 | + |
| 52 | +## Expected Outcome |
| 53 | + |
| 54 | +### User Experience |
| 55 | + |
| 56 | +When a user creates an API token, they **have the option to provide it an arbitrary name** and **the actual token value is only displayed to them once**. |
| 57 | + |
| 58 | +Existing API tokens, should be hashed and their original value discarded. |
| 59 | + |
| 60 | +A notice in the Sentry UI should be presented to suggest the user rotate and generate a new token to the new and improved version. |
| 61 | + |
| 62 | +### Token Format |
| 63 | + |
| 64 | +We need a predictable token format in order to integrate properly with secret scanning services. Our current format is a 64 character alphanumeric string. This is insufficient and would likely produce a high amount of false positives in tooling like [TruffleHog](https://github.com/trufflesecurity/trufflehog), [detect-secrets](https://github.com/Yelp/detect-secrets), [Github's Secret Scanning](https://docs.github.com/en/code-security/secret-scanning/about-secret-scanning), etc. |
| 65 | + |
| 66 | +The suggested pattern is `snty[a-zA-Z]_[a-zA-Z0-9]{64}`. The character _after_ `snty` will be used to identify the token type. |
| 67 | + |
| 68 | +For example: |
| 69 | + |
| 70 | +- `sntyu_a1b2c3...` - identifies a **user** API token |
| 71 | +- `sntya_a1b2c3...` - identifies an **API Application** token |
| 72 | + |
| 73 | +### Backend Token Hygiene |
| 74 | + |
| 75 | +API tokens should be stored in the database with a cryptographically secure hash (minimum SHA-256). Salting is not necessary since these values are sufficient in length. |
| 76 | + |
| 77 | +An additional column identifying the API token version will be added in order to correctly identify legacy API tokens. |
| 78 | + |
| 79 | +### Secret Verification Service |
| 80 | + |
| 81 | +In order to integrate with Github's Secret Scanning service, we will need to have an available service for them to submit potential matches to. |
| 82 | + |
| 83 | +More details about Github's secret scanning program can be found [here](https://docs.github.com/en/developers/overview/secret-scanning-partner-program). |
| 84 | + |
| 85 | +At a high-level, we will need to be able to: |
| 86 | + |
| 87 | +- Receive a list of one or more potential tokens |
| 88 | +- [Verify the signature of the request](https://docs.github.com/en/developers/overview/secret-scanning-partner-program#implement-signature-verification-in-your-secret-alert-service) |
| 89 | +- Determine if the token is a true positive or a false positive |
| 90 | + - send this feedback back to Github |
| 91 | +- Disable/revoke the compromised token |
| 92 | +- Send a notification to the appropriate user |
| 93 | + |
| 94 | +> :memo: This secret verification service is not an immediate need or a dependency of implementing the improved API tokens. |
| 95 | +
|
| 96 | +# Background |
| 97 | + |
| 98 | +Accidental credential leaks [happen](https://github.com/ChALkeR/notes/blob/master/Do-not-underestimate-credentials-leaks.md). Even though we provide several tools for users to limit the storage of sensitive information in Sentry it can still happen. |
| 99 | + |
| 100 | +Our support for various authentication methods (including 2FA, SSO, and SAML) helps mitigate access of potentially sensitive information, but API tokens cannot support these additional authentication gates. |
| 101 | + |
| 102 | +We can help customers further protect their accounts and data by providing a means of auto-detecting leaked API tokens. |
| 103 | + |
| 104 | +# Options Considered |
| 105 | + |
| 106 | +## Option #1 |
| 107 | + |
| 108 | +The migration from the legacy tokens to the new tokens should be rolled out over X releases. This allows for a smooth transition of self-hosted instances keeping pace with the versions. |
| 109 | + |
| 110 | +> With `n` being the current version. |
| 111 | +
|
| 112 | +### `n+1` |
| 113 | + |
| 114 | +- Add a Django migration including the `hashed_token`, `name`, and `token_version` fields. |
| 115 | + - columns should allow `null` values for now. |
| 116 | +- Remove displaying the plaintext token in the frontend, and only display newly created tokens once. |
| 117 | +- Implement new logic to generate the new token pattern. |
| 118 | + - The `token_version` should be set as `2`. |
| 119 | +- Implement a backwards compatible means of verifying legacy tokens. This should: |
| 120 | + - Verify the legacy token is valid/invalid. |
| 121 | + - If valid, hash the legacy token and store it in the table. |
| 122 | + - Remove the plaintext representation of the token in the table. |
| 123 | + - Mark the `token_version` as `1`. |
| 124 | +- A notification/warning in the UI should be displayed recommending users recreate their tokens, resulting in the new token version. |
| 125 | + |
| 126 | +The `n+1` Sentry version should run in production for sufficient time to update the majority of the rows in the database as the legacy tokens are _used_ by users. |
| 127 | + |
| 128 | +### `n+2` |
| 129 | + |
| 130 | +- Add a Django migration to generate the `hashed_token` value for any tokens in `sentry_apitoken` that do not already have a hashed value. |
| 131 | + - Rows matching this are legacy tokens that were not used during the transition period. |
| 132 | + - The `token_version` should be set to `1` for these rows. |
| 133 | + |
| 134 | +## Option #2 |
| 135 | + |
| 136 | +Instead of slowly generating the hashed token values over time of the legacy tokens (as in Option #1), we could generate a single migration that migrates the entire table to the end state. |
| 137 | + |
| 138 | +## Option #3 |
| 139 | + |
| 140 | +To avoid the two different token versions, we could automatically prepend the prefix `sntyx_` (with `x` just being a placeholder here). We would then follow a similar approach to Option #1 or Option #2 to generate the hashed values. |
| 141 | + |
| 142 | +# Drawbacks |
| 143 | + |
| 144 | +## Option #2 |
| 145 | + |
| 146 | +- This is less than ideal because of the potential performance impact while the migration mutates each row in the table. Potential impacts to self-hosted installs would be unknown as well. |
| 147 | + |
| 148 | +## Option #3 |
| 149 | + |
| 150 | +- Users would not be able to benefit from the Github Secrets Scanning since they would still be using their 64 character alphanumeric string without the prefix. |
| 151 | +- Authentication views/logic would become more complex with branching code to handle the with and without prefix cases. |
| 152 | + |
| 153 | +# Unresolved questions |
| 154 | + |
| 155 | +- Implementation path for `APIApplication` secrets. |
| 156 | +- Is there a downside to allowing `null` in the additional columns? |
| 157 | +- What is the median time between token use? |
| 158 | + - _This value could be used to inform how long we wait between versions for the migration that will edit pending rows in the database._ |
| 159 | +- Do we want to support [API Keys](https://docs.sentry.io/api/auth/#api-keys) as well, even though they are considered a legacy means of authentication? |
| 160 | +- Are there additional secret types we would like to apply this to? |
| 161 | +- Should the secret verification service live within the monolith or run as a separate service entirely? |
| 162 | + - _Github [mentions that there could be large batches of secrets sent for verification](https://docs.github.com/en/developers/overview/secret-scanning-partner-program#create-a-secret-alert-service:~:text=Your%20endpoint%20should%20be%20able%20to%20handle%20requests%20with%20a%20large%20number%20of%20matches%20without%20timing%20out.). It might make sense to keep this as a separate service to avoid impact._ |
0 commit comments