-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function #32667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2067,7 +2067,7 @@ def add_shuffle_key(split, iterator): | |
| avg = int(size / n) >> 20 | ||
| # let 1M < avg < 10M | ||
| if avg < 1: | ||
| batch *= 1.5 | ||
| batch = min(sys.maxsize, batch * 1.5) | ||
|
||
| elif avg > 10: | ||
| batch = max(int(batch / 1.5), 1) | ||
| c = 0 | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually when
get_used_memory() > limitis true, I don't know why we want to increasebatch *= 1.5.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess to increase the size of batch and to use more memory .. ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm..I thought increasing
batchis forc > batch. In other words, it increases the size of batch if it reaches the current batch size, but used memory is still underlimit(and the average size of bucket is small).If it reaches memory limit before reaching the batch size (so it means current batch size is more than memory limit), it seems not make sense to increase batch size (even the average size of bucket is small).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. the batch size should not increase when reaching the memory limit