-
Notifications
You must be signed in to change notification settings - Fork 433
Open
Labels
area/client/sparkgood first issueGood for newcomersGood for newcomersperformanceteam/ecosystemTeam EcosystemTeam Ecosystem
Description
Discovered during a recent large-scale run. When running at large scales, sweeping on AWS S3 runs into numerous "SlowDown" messages from deleteObjects, overloads even those, and makes no progress. I had to reduce to 3 executor VMs, total 12 executor threads, to make progress.
Rate-limit the sweep. One way to do this might be to repartition work to create only 12 partitions, but that creates 12 huge partitions. Ideally we would retain many partitions and only reduce run parallelism to ~12 (configurable...) for this phase.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/client/sparkgood first issueGood for newcomersGood for newcomersperformanceteam/ecosystemTeam EcosystemTeam Ecosystem