-
Notifications
You must be signed in to change notification settings - Fork 3.9k
fix(operator): support private VPC S3 endpoints in Loki operator #19247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(operator): support private VPC S3 endpoints in Loki operator #19247
Conversation
JoaoBraveCoding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all thank you so much for contributing to the project! 🙏 It's much appreciated!
Overall look good just a small comment and I would like to ask you if you could rework the commit message/PR tittle so it shows fix(operator): ... so that your change is included in the release notes.
|
|
||
| // Check if it's a VPC endpoint format: https://bucket.vpce-*-region.s3.region.vpce.amazonaws.com | ||
| // or https://vpce-*-region.s3.region.vpce.amazonaws.com | ||
| if strings.Contains(endpoint, ".vpce.amazonaws.com") && strings.Contains(endpoint, region) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xperimental might correct me here but IIRC one of the points of introducing this validation was to avoid user from using endpoints that looked like https://bucketName.endpoint.com because this caused problems with users creating folders inside of buckets. So my suggestion would by to improve this so that we would allow vpce endpoints but not endpoints that had the bucket name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoaoBraveCoding Thanks for the feedback! I've updated both the commit message to use the fix(operator): format and improved the VPC endpoint validation logic.
You're absolutely right about the bucket name issue. I've changed the validation to only allow general VPC endpoints (vpce-) and reject bucket-specific ones (bucket.vpce-) to prevent the folder creation problems. The hostname check now ensures we only accept endpoints that start with "vpce-" rather than allowing any VPC endpoint format.
All tests are passing including a new test case that confirms bucket-specific VPC endpoints get rejected properly. Ready for another look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick fix! Now the only thing would be changing it so it matches the error flow of the rest of the function. In go generally it's good practice to make the error cases the branching paths of a function. This will also allow us to return better error messages. Take for instance the first test case you added, the problem with the config is that the bucket name is in the endpoint, however the error we are returning doesn't match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoaoBraveCoding Thanks for the detailed feedback and the Go style guide reference.👍🏻
I've restructured the validation function to follow Go's "Indent Error Flow" pattern with error cases
handled first and early returns.
Also refactored the code to use shorter variable names (u, host) and clearer comments following Go conventions.
The bucket-specific VPC endpoint now returns a much more specific error message that directly tells users
what's wrong with their configuration.
Thanks again for the link that error flow documentation is going to be useful for future code contribution!
All tests are passing and ready for another look. 🙏
fb41bd0 to
12a934a
Compare
This PR fixes the Loki operator's overly strict S3 endpoint validation that was rejecting private VPC S3 endpoints. The operator was only accepting the standard AWS S3 endpoint format and failing to reconcile when users configured private VPC endpoints in OpenShift environments. The fix updates the validateS3Endpoint function to accept VPC endpoint formats while preventing bucket-specific VPC endpoints that could cause folder creation issues: - Allows: https://vpce-*-region.s3.region.vpce.amazonaws.com (general VPC endpoint) - Rejects: https://bucket.vpce-*-region.s3.region.vpce.amazonaws.com (bucket-specific VPC endpoint) This maintains full backward compatibility with existing standard AWS S3 endpoints while adding support for VPC endpoints without the bucket name prefix that could lead to unwanted folder creation in S3 buckets. Fixes grafana#19243 Signed-off-by: puretension <[email protected]>
12a934a to
3f4bf2c
Compare
- Follow Go's 'Indent Error Flow' pattern by handling error cases first - Provide more specific error message for bucket-specific VPC endpoints - Maintain backward compatibility with existing validation logic - Update test to reflect clearer error messaging Signed-off-by: puretension <[email protected]>
- Use shorter variable names following Go conventions (u, host) - Add clear English comments for each validation section - Improve function signature with grouped parameters - Maintain same validation logic and error handling Signed-off-by: puretension <[email protected]>
- Revert variable name from 'u' to 'parsedURL' for better readability - Keep AWS validation logic within conditional block structure - Define VPC bucket name error as constant for consistency Signed-off-by: puretension <[email protected]>
JoaoBraveCoding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally me and @xperimental discussed briefly your PR and he suggested trying out doing a validation with a regex, so based on your PR I did a commit and opened a PR puretension#1. While I was at it I also tested this against an actual VPC bucket to check if we were adding the right support for the endpoint. I think the only thing we might be missing is supporting accesspoint and control not only bucket but today I was only left with time to test bucket, took this info from here https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html#accessing-bucket-and-aps-from-interface-endpoints.
refactor to use regex + e2e testing
|
@JoaoBraveCoding Thank you so much for this thoughtful contribution! 🙏 Your regex-based validation approach is much more precise and handles edge cases I hadn't considered. The specific error messages for bucket name detection and region mismatches will make debugging so much easier for users facing VPC endpoint issues. I really appreciate how you took the time to understand the underlying problem and improve the solution rather than just reviewing. This kind of collaborative problem-solving is what makes working on open source so rewarding. I'd be happy to add support for Thank you for the learning experience and the opportunity to contribute further! |
JoaoBraveCoding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @puretension for being such a great contributor, your words are much appreciated! 🙏
Regarding support for the other endpoints unless there is a good technical reason I think we can postpone support for them. Mainly because this would also mean less code.
|
@JoaoBraveCoding Thank you for the excellent improvements and thorough review! I've accepted all your suggestions - the updated error messages are much clearer and removing bucket names from the test cases makes perfect sense for avoiding folder creation issues. I completely agree on postponing accesspoint/control endpoints to keep the code simpler. I'll create a follow-up issue for those additional endpoint types. (I never gonna miss them 😄) Your collaborative approach and attention to detail has been amazing. I've learned a lot from this review process about Go best practices and error handling patterns. Please let me know if there are any additional actions needed from my side. Thank you again for this great learning experience! 🙏 |
JoaoBraveCoding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the quick fixes and being such an awesome contributor! From my side it's a lgtm but I'll wait for @xperimental to also give it a review.
|
Seems like we still have a failing test but apart from that should be good |
- Correct VPC endpoint regex to match actual AWS format (vpce-id vs bucket.vpce-id) - Update test error messages to match implementation - Fix TestS3Extract/aws_s3_vpc_endpoint_wrong_region test case Signed-off-by: puretension <[email protected]>
066c0af to
f4b77ac
Compare
|
Hi @JoaoBraveCoding! I've fixed the failing test issue you mentioned. Please let me know if there are any remaining issues or failure points. @xperimental Could you please review this PR when you have a chance? Thanks for your patience and support! 🙏 |
|
Any updates? I'm also interested in using S3 VPC endpoints to reduce costs, and this fix would help us a lot |
|
@puretension Any chance you could resolve the merge conflict. Sorry this taking long to merge but our team has been quite overwhelmed we will get to you PR ASAP |
|
@JoaoBraveCoding No problem! I've resolved the merge conflicts and pushed the changes. |
|
@puretension seems like there was something wrong with the resolution of the merge |
73ba041 to
ad1c98f
Compare
Signed-off-by: puretension <[email protected]>
ad1c98f to
0852ee1
Compare
Signed-off-by: Dohyeong Lee <[email protected]>
|
@JoaoBraveCoding Thanks for pointing that out! I messed up the previous merge. 😅 |
|
I have raise an RFE https://issues.redhat.com/browse/RFE-8590 for this query. We have opened this RFE just for internal tracking |
What this PR does / why we need it:
This PR fixes the Loki operator's overly strict S3 endpoint validation that was rejecting private VPC S3 endpoints. The operator was only accepting the standard AWS S3 endpoint format (
https://s3.region.amazonaws.com) and failing to reconcile when users configured private VPC endpoints in OpenShift environments.The fix updates the
validateS3Endpointfunction to also accept VPC endpoint formats:https://bucket.vpce-*-region.s3.region.vpce.amazonaws.com(bucket-specific VPC endpoint)https://vpce-*-region.s3.region.vpce.amazonaws.com(general VPC endpoint)Which issue(s) this PR fixes:
Fixes #19243
Special notes for your reviewer:
This change maintains full backward compatibility with existing standard AWS S3 endpoints while adding support for VPC endpoints. The validation logic now:
.vpce.amazonaws.comand the specified region)The fix includes comprehensive test cases covering both VPC endpoint formats to ensure the validation works correctly.
Checklist
CONTRIBUTING.mdguide (required)docs/sources/setup/upgrade/_index.md(not applicable - this is a bug fix that enables previously broken functionality)deprecated-config.yamlanddeleted-config.yamlfiles respectively (not applicable - no configuration changes)Changes Made
Code Changes
operator/internal/handlers/internal/storage/secrets.go:validateS3Endpoint()function to support VPC endpoint validationTest Changes
operator/internal/handlers/internal/storage/secrets_test.go:Validation Logic
Before (rejected VPC endpoints):
After (accepts both standard and VPC endpoints):
Impact
This fix enables users to successfully deploy Loki in OpenShift environments with private VPC S3 endpoints, resolving the operator reconciliation failures described in issue #19243.