Skip to content

Conversation

@puretension
Copy link
Contributor

What this PR does / why we need it:

This PR fixes the Loki operator's overly strict S3 endpoint validation that was rejecting private VPC S3 endpoints. The operator was only accepting the standard AWS S3 endpoint format (https://s3.region.amazonaws.com) and failing to reconcile when users configured private VPC endpoints in OpenShift environments.

The fix updates the validateS3Endpoint function to also accept VPC endpoint formats:

  • https://bucket.vpce-*-region.s3.region.vpce.amazonaws.com (bucket-specific VPC endpoint)
  • https://vpce-*-region.s3.region.vpce.amazonaws.com (general VPC endpoint)

Which issue(s) this PR fixes:
Fixes #19243

Special notes for your reviewer:

This change maintains full backward compatibility with existing standard AWS S3 endpoints while adding support for VPC endpoints. The validation logic now:

  1. First checks for standard AWS S3 endpoint format
  2. Then checks for VPC endpoint patterns (contains .vpce.amazonaws.com and the specified region)
  3. Only rejects endpoints that match neither pattern

The fix includes comprehensive test cases covering both VPC endpoint formats to ensure the validation works correctly.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added (not applicable - internal validation logic)
  • Tests updated (added comprehensive test cases for VPC endpoints)
  • Title matches the required conventional commits format, see here
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md (not applicable - this is a bug fix that enables previously broken functionality)
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively (not applicable - no configuration changes)

Changes Made

Code Changes

  • Modified operator/internal/handlers/internal/storage/secrets.go:
    • Updated validateS3Endpoint() function to support VPC endpoint validation
    • Added logic to detect VPC endpoint patterns
    • Maintained backward compatibility with standard AWS endpoints

Test Changes

  • Added comprehensive test cases in operator/internal/handlers/internal/storage/secrets_test.go:
    • Test case for bucket-specific VPC endpoint format
    • Test case for general VPC endpoint format
    • Both test cases verify correct configuration parsing

Validation Logic

Before (rejected VPC endpoints):

validEndpoint := fmt.Sprintf("https://s3.%s%s", region, awsEndpointSuffix)
if endpoint != validEndpoint {
   return fmt.Errorf("%w: %s", errS3EndpointAWSInvalid, validEndpoint)
}

After (accepts both standard and VPC endpoints):

// Check standard AWS S3 endpoint
validEndpoint := fmt.Sprintf("https://s3.%s%s", region, awsEndpointSuffix)
if endpoint == validEndpoint {
   return nil
}

// Check VPC endpoint format
if strings.Contains(endpoint, ".vpce.amazonaws.com") && strings.Contains(
endpoint, region) {
   return nil
}

// Reject if neither format matches
return fmt.Errorf("%w: %s", errS3EndpointAWSInvalid, validEndpoint)

Impact

This fix enables users to successfully deploy Loki in OpenShift environments with private VPC S3 endpoints, resolving the operator reconciliation failures described in issue #19243.

Copy link
Collaborator

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all thank you so much for contributing to the project! 🙏 It's much appreciated!

Overall look good just a small comment and I would like to ask you if you could rework the commit message/PR tittle so it shows fix(operator): ... so that your change is included in the release notes.


// Check if it's a VPC endpoint format: https://bucket.vpce-*-region.s3.region.vpce.amazonaws.com
// or https://vpce-*-region.s3.region.vpce.amazonaws.com
if strings.Contains(endpoint, ".vpce.amazonaws.com") && strings.Contains(endpoint, region) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xperimental might correct me here but IIRC one of the points of introducing this validation was to avoid user from using endpoints that looked like https://bucketName.endpoint.com because this caused problems with users creating folders inside of buckets. So my suggestion would by to improve this so that we would allow vpce endpoints but not endpoints that had the bucket name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoaoBraveCoding Thanks for the feedback! I've updated both the commit message to use the fix(operator): format and improved the VPC endpoint validation logic.

You're absolutely right about the bucket name issue. I've changed the validation to only allow general VPC endpoints (vpce-) and reject bucket-specific ones (bucket.vpce-) to prevent the folder creation problems. The hostname check now ensures we only accept endpoints that start with "vpce-" rather than allowing any VPC endpoint format.

All tests are passing including a new test case that confirms bucket-specific VPC endpoints get rejected properly. Ready for another look!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick fix! Now the only thing would be changing it so it matches the error flow of the rest of the function. In go generally it's good practice to make the error cases the branching paths of a function. This will also allow us to return better error messages. Take for instance the first test case you added, the problem with the config is that the bucket name is in the endpoint, however the error we are returning doesn't match.

Copy link
Contributor Author

@puretension puretension Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoaoBraveCoding Thanks for the detailed feedback and the Go style guide reference.👍🏻

I've restructured the validation function to follow Go's "Indent Error Flow" pattern with error cases
handled first and early returns.
Also refactored the code to use shorter variable names (u, host) and clearer comments following Go conventions.
The bucket-specific VPC endpoint now returns a much more specific error message that directly tells users
what's wrong with their configuration.

Thanks again for the link that error flow documentation is going to be useful for future code contribution!
All tests are passing and ready for another look. 🙏

@puretension puretension changed the title fix: support private VPC S3 endpoints in Loki operator fix(operator): support private VPC S3 endpoints in Loki operator Sep 23, 2025
@puretension puretension force-pushed the fix/support-vpc-s3-endpoint-19243 branch from fb41bd0 to 12a934a Compare September 23, 2025 10:14
This PR fixes the Loki operator's overly strict S3 endpoint validation that was rejecting private VPC S3 endpoints. The operator was only accepting the standard AWS S3 endpoint format and failing to reconcile when users configured private VPC endpoints in OpenShift environments.

The fix updates the validateS3Endpoint function to accept VPC endpoint formats while preventing bucket-specific VPC endpoints that could cause folder creation issues:
- Allows: https://vpce-*-region.s3.region.vpce.amazonaws.com (general VPC endpoint)
- Rejects: https://bucket.vpce-*-region.s3.region.vpce.amazonaws.com (bucket-specific VPC endpoint)

This maintains full backward compatibility with existing standard AWS S3 endpoints while adding support for VPC endpoints without the bucket name prefix that could lead to unwanted folder creation in S3 buckets.

Fixes grafana#19243

Signed-off-by: puretension <[email protected]>
@puretension puretension force-pushed the fix/support-vpc-s3-endpoint-19243 branch from 12a934a to 3f4bf2c Compare September 23, 2025 13:49
- Follow Go's 'Indent Error Flow' pattern by handling error cases first
- Provide more specific error message for bucket-specific VPC endpoints
- Maintain backward compatibility with existing validation logic
- Update test to reflect clearer error messaging

Signed-off-by: puretension <[email protected]>
- Use shorter variable names following Go conventions (u, host)
- Add clear English comments for each validation section
- Improve function signature with grouped parameters
- Maintain same validation logic and error handling

Signed-off-by: puretension <[email protected]>
puretension and others added 2 commits September 25, 2025 18:35
- Revert variable name from 'u' to 'parsedURL' for better readability
- Keep AWS validation logic within conditional block structure
- Define VPC bucket name error as constant for consistency

Signed-off-by: puretension <[email protected]>
Copy link
Collaborator

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally me and @xperimental discussed briefly your PR and he suggested trying out doing a validation with a regex, so based on your PR I did a commit and opened a PR puretension#1. While I was at it I also tested this against an actual VPC bucket to check if we were adding the right support for the endpoint. I think the only thing we might be missing is supporting accesspoint and control not only bucket but today I was only left with time to test bucket, took this info from here https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html#accessing-bucket-and-aps-from-interface-endpoints.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Sep 26, 2025
@puretension
Copy link
Contributor Author

@JoaoBraveCoding Thank you so much for this thoughtful contribution! 🙏

Your regex-based validation approach is much more precise and handles edge cases I hadn't considered. The specific error messages for bucket name detection and region mismatches will make debugging so much easier for users facing VPC endpoint issues.

I really appreciate how you took the time to understand the underlying problem and improve the solution rather than just reviewing. This kind of collaborative problem-solving is what makes working on open source so rewarding.

I'd be happy to add support for accesspoint and control endpoints as you mentioned. Should I include this in the current PR to make it a complete VPC endpoint solution, or would you prefer a separate follow-up PR? I'll study the AWS documentation you referenced and implement the additional validation patterns.

Thank you for the learning experience and the opportunity to contribute further!

Copy link
Collaborator

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @puretension for being such a great contributor, your words are much appreciated! 🙏

Regarding support for the other endpoints unless there is a good technical reason I think we can postpone support for them. Mainly because this would also mean less code.

@puretension
Copy link
Contributor Author

@JoaoBraveCoding Thank you for the excellent improvements and thorough review!

I've accepted all your suggestions - the updated error messages are much clearer and removing bucket names from the test cases makes perfect sense for avoiding folder creation issues.

I completely agree on postponing accesspoint/control endpoints to keep the code simpler. I'll create a follow-up issue for those additional endpoint types. (I never gonna miss them 😄)

Your collaborative approach and attention to detail has been amazing. I've learned a lot from this review process about Go best practices and error handling patterns.

Please let me know if there are any additional actions needed from my side. Thank you again for this great learning experience! 🙏

Copy link
Collaborator

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the quick fixes and being such an awesome contributor! From my side it's a lgtm but I'll wait for @xperimental to also give it a review.

@JoaoBraveCoding
Copy link
Collaborator

Seems like we still have a failing test but apart from that should be good

- Correct VPC endpoint regex to match actual AWS format (vpce-id vs bucket.vpce-id)
- Update test error messages to match implementation
- Fix TestS3Extract/aws_s3_vpc_endpoint_wrong_region test case

Signed-off-by: puretension <[email protected]>
@puretension puretension force-pushed the fix/support-vpc-s3-endpoint-19243 branch from 066c0af to f4b77ac Compare October 15, 2025 06:30
@puretension
Copy link
Contributor Author

Hi @JoaoBraveCoding!

I've fixed the failing test issue you mentioned.
The problem was with the VPC endpoint regex pattern and test assertions. I've corrected both the regex to match actual AWS VPC endpoint format and updated the test error messages accordingly.

Please let me know if there are any remaining issues or failure points.

@xperimental Could you please review this PR when you have a chance?

Thanks for your patience and support! 🙏

@ldiego73
Copy link

Any updates? I'm also interested in using S3 VPC endpoints to reduce costs, and this fix would help us a lot

@JoaoBraveCoding
Copy link
Collaborator

JoaoBraveCoding commented Nov 19, 2025

@puretension Any chance you could resolve the merge conflict. Sorry this taking long to merge but our team has been quite overwhelmed we will get to you PR ASAP

@puretension
Copy link
Contributor Author

@JoaoBraveCoding No problem! I've resolved the merge conflicts and pushed the changes.

@JoaoBraveCoding
Copy link
Collaborator

@puretension seems like there was something wrong with the resolution of the merge

@puretension puretension force-pushed the fix/support-vpc-s3-endpoint-19243 branch from 73ba041 to ad1c98f Compare December 4, 2025 07:20
@puretension puretension force-pushed the fix/support-vpc-s3-endpoint-19243 branch from ad1c98f to 0852ee1 Compare December 4, 2025 07:25
@puretension
Copy link
Contributor Author

@JoaoBraveCoding Thanks for pointing that out! I messed up the previous merge. 😅
I've fixed the conflicts properly now and also corrected a small bug in the VPC regex.
Tests are passing!

@amrin101
Copy link

I have raise an RFE https://issues.redhat.com/browse/RFE-8590 for this query. We have opened this RFE just for internal tracking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inability to use a private VPC S3 endpoint with Loki

5 participants