feat: #12442 improve s3 upload speed#15260
feat: #12442 improve s3 upload speed#15260antoinetran wants to merge 11 commits intoargoproj:mainfrom
Conversation
|
I tested with this branch https://github.com/antoinetran/argo-workflows/tree/v3.7.1-hardcode-s3-multithread that is created from v3.7.1 in my test cluster with argo v3.7.1. But this pull request is in a branch created from main today. |
42ece45 to
2176e40
Compare
6510f95 to
ffecf36
Compare
2e4048e to
a46a737
Compare
|
We did thorough benchmarks and here are the results. Test env: private kubernetes cluster with private s3 server 1/ shows upload speeds at ~30MiB/s My understanding of |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThese changes implement environment-driven configuration for S3 multipart upload parallelization. New environment variables control the number of upload threads and part size, with validation logic and Minio integration in the S3 artifact handler. Documentation and spell-check entries support the new functionality. Changes
Sequence DiagramsequenceDiagram
participant App as Application
participant Env as Environment
participant S3Handler as S3 Handler Functions
participant Minio as Minio Client
participant S3 as S3 Storage
App->>S3Handler: PutFile(filePath, data)
S3Handler->>Env: getFromEnvS3UploadNbThreads()
Env-->>S3Handler: numThreads (or default 4)
S3Handler->>Env: getFromEnvS3UploadPartSizeMiB()
Env-->>S3Handler: partSizeMiB (or default -1)
S3Handler->>S3Handler: getCheckedPartSize(filePath, partSizeMiB)
Note over S3Handler: Validate: 5 MiB ≤ size ≤ 5 GiB<br/>and size ≤ file size
S3Handler-->>S3Handler: validPartSize (or error)
S3Handler->>Minio: PutObjectOptions with<br/>ConcurrentStreamParts=true<br/>NumThreads=numThreads<br/>PartSize=validPartSize
Minio->>S3: Upload with parallel<br/>multipart streams
S3-->>Minio: Upload complete
Minio-->>S3Handler: Success
S3Handler-->>App: File uploaded
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@workflow/artifacts/s3/s3.go`:
- Around line 508-528: The getFromEnvS3UploadNbThreads function currently
accepts any integer from the env var and may allow zero/negative values that
later get cast to uint; after parsing nbThreads with strconv.Atoi, validate that
nbThreads >= 1 and if not set nbThreads = defaultThreads and log an Error
(include common.EnvAgentS3UploadThreads and the invalid nbThreadsStr/value) so
the code never returns 0 or a negative number that could wrap when cast to uint
for MinIO calls.
…oj#12442) Signed-off-by: antoinetran <antoinetran@users.noreply.github.com>
…oj#12442) Signed-off-by: antoinetran <antoinetran@users.noreply.github.com>
Signed-off-by: antoinetran antoinetran@users.noreply.github.com Signed-off-by: antoinetran <antoinetran@users.noreply.github.com>
Signed-off-by: antoinetran <antoinetran@users.noreply.github.com>
Signed-off-by: antoinetran <antoinetran@users.noreply.github.com>
Signed-off-by: antoinetran <antoinetran@users.noreply.github.com>
a46a737 to
5548272
Compare
|
Can someone rerun the test? One test failed but I guess it is a random fail that has nothing to do with my change. |
|
Hi! I don't know if there is anything more I can do on this? ping @Joibel ? Thanks! |
Joibel
left a comment
There was a problem hiding this comment.
This is really a feature. Please could you retitle it as such, and add the necessary feature description file. If you'd like it backported I think we can make an exception for this and put it in 4.0.
wording only Co-authored-by: Alan Clucas <alan@clucas.org> Signed-off-by: antoinetran <antoinetran@users.noreply.github.com>
wording only Co-authored-by: Alan Clucas <alan@clucas.org> Signed-off-by: antoinetran <antoinetran@users.noreply.github.com>
wording only Co-authored-by: Alan Clucas <alan@clucas.org> Signed-off-by: antoinetran <antoinetran@users.noreply.github.com>
Thanks for the feedback! I renamed to "feat" this PullRequest, and I created a feature issue #15636 and linked it to this ticket. Added the feature description file (MD file).
I think you meant if I want to backport it to argo workflows 3.X instead of 4.X? Because the latest tags are 4.X. I am ok with not backporting it to 3.X and migrating to 4.X when this is merged and tagged as a 4.X release. |
… s3 upload speed argoproj#15260 and argoproj#12442) Signed-off-by: antoinetran <antoinetran@users.noreply.github.com>
…upload speed argoproj#15260 and argoproj#12442) Signed-off-by: antoinetran <antoinetran@users.noreply.github.com>


Fixes #12442 improve s3 upload speed (issue ticket)
Fixes #15636 (feature ticket)
Motivation
The need:
Upload in argo wait container is "slow" (23min for 35GiB) while a file download is fast (2min for 35GiB) to write on disk. More benchmark will follow to confirm.
The how:
The argo code uses minio but does not allow the number of threads for S3 multipart upload, nor does it allow setting the part size. Reading the minio code, it seems the default threads is 4 and the default part size is calculated accordingly to the file size, but generally it is 16MiB (for file size <= 156GiB). This PR allows setting both of them with argo env var, as it is the case for other argo params for argoexec image.
Modifications
The change is to add env var in doc and read them, if they are defined, to set the part size and threads number. The part size is also checked so that it is a valid value for Minio: 5MiB < part size < 5GiB and file size > part size.
Verification
Workflow to test:
Expected result in wait logs:
Documentation
This page https://argo-workflows.readthedocs.io/en/latest/environment-variables/ will be update thanks to the related commit that add doc.
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.