-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-7968][CORE] Rename minPartitions to maxPartitions in wholeTextFiles/binaryFiles #6518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This does not appear to be a SQL patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's strange that this still defaults to defaultMinPartitions. Does that need to be fixed as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because i don‘t know what is the best defalut value, one will be ok?
|
ok to test. @srowen does this seem valid to you? |
|
Also, it seems that here we're changing parameter names of public APIs. This is not backward compatible right? |
|
Test build #35172 has finished for PR 6518 at commit
|
|
@andrewor14 You mean the code like |
|
Yes, but that by itself maybe a reason why we can't merge this patch. |
|
@andrewor14 Can we add new method I want to rename strongly because |
|
I don't understand this. The value |
|
@srowen |
|
The number of partitions is generally equal to the number of files. I don't think it can be less; it can be more. Really, the minPartitions setting rarely does anything; it's just a suggestion anyway. It might cause Hadoop to return multiple splits per file (which would be bad here actually). But it is definitely not a maximum and you can see this is incorrect, as it's passed to I suppose I'd argue that this arg should go away entirely as it seems like it can only hurt. At this point though it exists, and its name and doc is correct relative to what it does. |
This commit exists to close the following pull requests on Github: Closes #2849 (close requested by 'srowen') Closes #2786 (close requested by 'andrewor14') Closes #4678 (close requested by 'JoshRosen') Closes #5457 (close requested by 'andrewor14') Closes #3346 (close requested by 'andrewor14') Closes #6518 (close requested by 'andrewor14') Closes #5403 (close requested by 'pwendell') Closes #2110 (close requested by 'srowen')
The actual number of partition(task) in wholeTextFiles/binaryFiles is less than or equal to minPartitions, so maxPartitions is better than minPartitions.