-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37509][CORE] Improve Fallback Storage upload speed by avoiding S3 rate limiter #34762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Could you review this please, @viirya ? |
|
Test build #145776 has finished for PR 34762 at commit
|
|
lgtm |
|
Thank you so much, @viirya ! |
|
Only just seen this. I'm actually thinking that feature to go into s3a this year should be configurable rate limiting through the guava RateLimiter; I'm using this in the abfs committer to keep committer io below limits where throttling starts to cause problems with renames. This is all per process; things like random filenames are still going to be critical to spread load on s3. |
|
I didn't set any configurations. When a customer does not make a special request, the pre-defined throttling is applied by default (3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per). In the worst notebook scenario, 3500 executors may start to decommission at the same time and it causes throttling. |
|
OK. you should try enabling directory marker retention everywhere and see what that does to your work. The auditing stuff in 3.3.2 will help you identify which jobs and operations are generating the IO. |
|
They are orthogonal dimension from this PR, aren't they? |
|
yeah, you should do both. this is best as it will spread across all shards a bucket has |
… S3 rate limiter (apache#1379) ### What changes were proposed in this pull request? This PR aims to improve `Fallback Storage` upload speed by randomizing the path in order to avoid S3 rate limiter. ### Why are the changes needed? Currently, `Fallback Storage` is using `a single prefix per shuffle`. This PR aims to randomize the upload prefixes even in a single shuffle to avoid S3 rate limiter. - https://aws.amazon.com/premiumsupport/knowledge-center/s3-request-limit-avoid-throttling/ ### Does this PR introduce _any_ user-facing change? No. This is used internally during the runtime. ### How was this patch tested? Pass the CIs to verify read and write operations. To check the layout, check the uploaded path manually with the following configs. ``` spark.decommission.enabled true spark.storage.decommission.enabled true spark.storage.decommission.shuffleBlocks.enabled true spark.storage.decommission.fallbackStorage.path file:///tmp/fallback/ ``` Start one master and worker. Connect with `spark-shell` and generate shuffle data. ``` scala> sc.parallelize(1 to 11, 10).map(x => (x % 3, 1)).reduceByKey(_ + _).count() res0: Long = 3 ``` Invoke decommission and check. Since we have only one worker, the shuffle data go to the fallback storage directly. ``` $ kill -PWR <CoarseGrainedExecutorBackend JVM PID> $ tree /tmp/fallback /tmp/fallback └── app-20211130135922-0001 └── 0 ├── 103417883 │ └── shuffle_0_7_0.data ├── 1036881592 │ └── shuffle_0_4_0.data ├── 1094002679 │ └── shuffle_0_7_0.index ├── 1393510154 │ └── shuffle_0_6_0.index ├── 1515275369 │ └── shuffle_0_3_0.data ├── 1541340402 │ └── shuffle_0_2_0.index ├── 1639392452 │ └── shuffle_0_8_0.data ├── 1774061049 │ └── shuffle_0_9_0.index ├── 1846228218 │ └── shuffle_0_6_0.data ├── 1970345301 │ └── shuffle_0_1_0.data ├── 2073568524 │ └── shuffle_0_4_0.index ├── 227534966 │ └── shuffle_0_2_0.data ├── 266114061 │ └── shuffle_0_3_0.index ├── 413944309 │ └── shuffle_0_5_0.index ├── 581811660 │ └── shuffle_0_0_0.data ├── 705928743 │ └── shuffle_0_5_0.data ├── 713451784 │ └── shuffle_0_8_0.index ├── 861282032 │ └── shuffle_0_0_0.index ├── 912764509 │ └── shuffle_0_9_0.data └── 946172431 └── shuffle_0_1_0.index ``` Closes apache#34762 from dongjoon-hyun/SPARK-37509. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit ca25534) Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit c88b258) Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to improve
Fallback Storageupload speed by randomizing the path in order to avoid S3 rate limiter.Why are the changes needed?
Currently,
Fallback Storageis usinga single prefix per shuffle. This PR aims to randomize the upload prefixes even in a single shuffle to avoid S3 rate limiter.Does this PR introduce any user-facing change?
No. This is used internally during the runtime.
How was this patch tested?
Pass the CIs to verify read and write operations. To check the layout, check the uploaded path manually with the following configs.
Start one master and worker. Connect with
spark-shelland generate shuffle data.Invoke decommission and check. Since we have only one worker, the shuffle data go to the fallback storage directly.