Skip to content

Conversation

@shubham-pampattiwar
Copy link
Collaborator

@shubham-pampattiwar shubham-pampattiwar commented Oct 10, 2025

What this PR does / why we need it

Fixes the issue where Azure BackupStorageLocation status messages contain verbose HTTP response details and XML, making them difficult to read. This PR adds error sanitization to extract only the error code and meaningful message.

Which issue(s) this PR fixes

Fixes #8368

Before this change

When a BSL with a non-existent Azure bucket is created, the status message shows the full HTTP response:

BackupStorageLocation "test" is unavailable: rpc error: code = Unknown desc = GET https://oadp100711zl59k.blob.core.windows.net/oadp100711zl59k1
--------------------------------------------------------------------------------
RESPONSE 404: 404 The specified container does not exist.
ERROR CODE: ContainerNotFound
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist.
RequestId:63cf34d8-801e-0078-09b4-2e4682000000
Time:2024-11-04T12:23:04.5623627Z</Message></Error>
--------------------------------------------------------------------------------

After this change

The status message is clean and concise:

BackupStorageLocation "test" is unavailable: rpc error: code = Unknown desc = ContainerNotFound: The specified container does not exist.

Implementation details

  • Added sanitizeStorageError() function that detects Azure-style HTTP response errors (containing "RESPONSE" and "ERROR CODE:" patterns)
  • Extracts error code and meaningful message using regex
  • Preserves AWS/GCP-style errors unchanged
  • Maintains the gRPC error prefix structure

Testing

  • Added comprehensive unit tests covering various scenarios:
    • Nil errors
    • Simple/AWS-style errors (passed through unchanged)
    • Azure container not found with full HTTP response
    • Azure blob not found
    • Azure errors with plain text (no XML)
    • Azure errors with error code but no XML message
  • All existing tests continue to pass

Checklist

  • Accepted the DCO. Commits are signed-off.
  • Created a changelog file or comment /kind changelog-not-required on this PR.
  • Updated the corresponding documentation in site/content/docs/main (if applicable).

shubham-pampattiwar added a commit to shubham-pampattiwar/velero that referenced this pull request Oct 10, 2025
Signed-off-by: Shubham Pampattiwar <[email protected]>
@codecov
Copy link

codecov bot commented Oct 10, 2025

Codecov Report

❌ Patch coverage is 55.55556% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.14%. Comparing base (975f647) to head (20af2c2).
⚠️ Report is 55 commits behind head on main.

Files with missing lines Patch % Lines
...g/controller/backup_storage_location_controller.go 55.55% 22 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9321      +/-   ##
==========================================
- Coverage   60.14%   60.14%   -0.01%     
==========================================
  Files         385      385              
  Lines       35583    35635      +52     
==========================================
+ Hits        21403    21433      +30     
- Misses      12611    12632      +21     
- Partials     1569     1570       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ywk253100 ywk253100 requested review from anshulahuja98 and ywk253100 and removed request for blackpiglet October 17, 2025 06:11
unavailableErrors = append(unavailableErrors, err.Error())
location.Status.Phase = velerov1api.BackupStorageLocationPhaseUnavailable
location.Status.Message = err.Error()
location.Status.Message = sanitizeStorageError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we implement the logic on Azure plugin side? It is Azure specific logic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ywk253100 I am unsure about that. The sanitization is about cleaning user-facing messages, not Azure-specific business logic.

Copy link
Collaborator Author

@shubham-pampattiwar shubham-pampattiwar Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but I think the controller is the right place for a few reasons:

  • This is really about cleaning up the status message we show users, not about Azure-specific business logic. The controller owns the status field.
  • The function already checks if it looks like an Azure error and passes everything else through unchanged. So it won't break other providers.
  • If we put it in the plugin, we'd need to change the velero-plugin-for-microsoft-azure repo separately. And if GCP or AWS have similar issues later, we'd have to fix each plugin individually.
  • This also sets us up for the secret scrubbing enhancement that @anshulahuja98 suggested - easier to do in one place.
    Does that make sense, What do you think, open to hear suggestions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the thought is that each storage plugin should individually self-contain function/status message cleanups, since that's why we have plugin system in the first place.

But if code is heavily reusable with secret scrubbing then that's a good compromise.

@priyansh17
Copy link
Collaborator

priyansh17 commented Oct 21, 2025

func (s *objectBackupStore) IsValid() error

This return plain error object, which is used in Reconciler anonymous function.

func (r *backupStorageLocationReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)

I have noticed during logging in logReconciledPhase:
we log the same such as this:

Current BackupStorageLocations available/unavailable/unknown: 0/1/0, BackupStorageLocation "default" is unavailable: rpc error: code = Unknown desc = GET https://test01.blob.core.windows.net/az-blob-aks-test
--------------------------------------------------------------------------------
RESPONSE 403: 403 This request is not authorized to perform this operation using this permission.
ERROR CODE: AuthorizationPermissionMismatch
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="utf-8"?><Error><Code>AuthorizationPermissionMismatch</Code><Message>This request is not authorized to perform this operation using this permission.
RequestId:<GUID>
Time:2025-10-21T14:00:15.9852582Z</Message></Error>
--------------------------------------------------------------------------------
) :: {}

will these get masked as well?

shubham-pampattiwar added a commit to shubham-pampattiwar/velero that referenced this pull request Nov 12, 2025
This commit addresses three review comments on PR vmware-tanzu#9321:

1. Keep sanitization in controller (response to @ywk253100)
   - Maintaining centralized error handling for easier extension
   - Azure-specific patterns detected and others passed through unchanged

2. Sanitize unavailableErrors array (@priyansh17)
   - Now using sanitizeStorageError() for both unavailableErrors array
     and location.Status.Message for consistency

3. Add SAS token scrubbing (@anshulahuja98)
   - Scrubs Azure SAS token parameters to prevent credential leakage
   - Redacts: sig, se, st, sp, spr, sv, sr, sip, srt, ss
   - Example: ?sig=secret becomes ?sig=***REDACTED***

Added comprehensive test coverage for SAS token scrubbing with 4 new
test cases covering various scenarios.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
shubham-pampattiwar added a commit to shubham-pampattiwar/velero that referenced this pull request Nov 13, 2025
This commit addresses three review comments on PR vmware-tanzu#9321:

1. Keep sanitization in controller (response to @ywk253100)
   - Maintaining centralized error handling for easier extension
   - Azure-specific patterns detected and others passed through unchanged

2. Sanitize unavailableErrors array (@priyansh17)
   - Now using sanitizeStorageError() for both unavailableErrors array
     and location.Status.Message for consistency

3. Add SAS token scrubbing (@anshulahuja98)
   - Scrubs Azure SAS token parameters to prevent credential leakage
   - Redacts: sig, se, st, sp, spr, sv, sr, sip, srt, ss
   - Example: ?sig=secret becomes ?sig=***REDACTED***

Added comprehensive test coverage for SAS token scrubbing with 4 new
test cases covering various scenarios.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Shubham Pampattiwar <[email protected]>
@shubham-pampattiwar shubham-pampattiwar force-pushed the fix-azure-bsl-status-message-8368 branch 2 times, most recently from 50a445a to df6cf07 Compare November 13, 2025 05:08
shubham-pampattiwar added a commit to shubham-pampattiwar/velero that referenced this pull request Nov 13, 2025
Signed-off-by: Shubham Pampattiwar <[email protected]>
shubham-pampattiwar added a commit to shubham-pampattiwar/velero that referenced this pull request Nov 13, 2025
This commit addresses three review comments on PR vmware-tanzu#9321:

1. Keep sanitization in controller (response to @ywk253100)
   - Maintaining centralized error handling for easier extension
   - Azure-specific patterns detected and others passed through unchanged

2. Sanitize unavailableErrors array (@priyansh17)
   - Now using sanitizeStorageError() for both unavailableErrors array
     and location.Status.Message for consistency

3. Add SAS token scrubbing (@anshulahuja98)
   - Scrubs Azure SAS token parameters to prevent credential leakage
   - Redacts: sig, se, st, sp, spr, sv, sr, sip, srt, ss
   - Example: ?sig=secret becomes ?sig=***REDACTED***

Added comprehensive test coverage for SAS token scrubbing with 4 new
test cases covering various scenarios.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Shubham Pampattiwar <[email protected]>
shubham-pampattiwar and others added 3 commits December 2, 2025 11:37
Azure storage errors include verbose HTTP response details and XML
in error messages, making the BSL status.message field cluttered
and hard to read. This change adds sanitization to extract only
the error code and meaningful message.

Before:
  BackupStorageLocation "test" is unavailable: rpc error: code = Unknown
  desc = GET https://...
  RESPONSE 404: 404 The specified container does not exist.
  ERROR CODE: ContainerNotFound
  <?xml version="1.0"...>

After:
  BackupStorageLocation "test" is unavailable: rpc error: code = Unknown
  desc = ContainerNotFound: The specified container does not exist.

AWS and GCP error messages are preserved as-is since they don't
contain verbose HTTP responses.

Fixes vmware-tanzu#8368

Signed-off-by: Shubham Pampattiwar <[email protected]>
Signed-off-by: Shubham Pampattiwar <[email protected]>
This commit addresses three review comments on PR vmware-tanzu#9321:

1. Keep sanitization in controller (response to @ywk253100)
   - Maintaining centralized error handling for easier extension
   - Azure-specific patterns detected and others passed through unchanged

2. Sanitize unavailableErrors array (@priyansh17)
   - Now using sanitizeStorageError() for both unavailableErrors array
     and location.Status.Message for consistency

3. Add SAS token scrubbing (@anshulahuja98)
   - Scrubs Azure SAS token parameters to prevent credential leakage
   - Redacts: sig, se, st, sp, spr, sv, sr, sip, srt, ss
   - Example: ?sig=secret becomes ?sig=***REDACTED***

Added comprehensive test coverage for SAS token scrubbing with 4 new
test cases covering various scenarios.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Shubham Pampattiwar <[email protected]>
@shubham-pampattiwar shubham-pampattiwar force-pushed the fix-azure-bsl-status-message-8368 branch from df6cf07 to 20af2c2 Compare December 2, 2025 19:38
@shubham-pampattiwar
Copy link
Collaborator Author

Review request reminder: @ywk253100 @priyansh17 @kaovilai

Copy link
Collaborator

@priyansh17 priyansh17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks

@shubham-pampattiwar shubham-pampattiwar merged commit 14b34f0 into vmware-tanzu:main Dec 12, 2025
52 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BSL status.message field shouldn't have the http response as output when bucket doesn't exist

5 participants