Skip to content

Conversation

@watermen
Copy link
Contributor

The actual number of partition(task) in wholeTextFiles/binaryFiles is less than or equal to minPartitions, so maxPartitions is better than minPartitions.

@marmbrus
Copy link
Contributor

This does not appear to be a SQL patch.

@watermen watermen changed the title [SPARK-7968][SQL] Rename minPartitions to maxPartitions in wholeTextFiles/binaryFiles [SPARK-7968][CORE] Rename minPartitions to maxPartitions in wholeTextFiles/binaryFiles Jun 15, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's strange that this still defaults to defaultMinPartitions. Does that need to be fixed as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because i don‘t know what is the best defalut value, one will be ok?

@andrewor14
Copy link
Contributor

ok to test. @srowen does this seem valid to you?

@andrewor14
Copy link
Contributor

Also, it seems that here we're changing parameter names of public APIs. This is not backward compatible right?

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35172 has finished for PR 6518 at commit cd64289.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@watermen
Copy link
Contributor Author

@andrewor14 You mean the code like wholeTextFiles("path", minPartitions = 1)? Others will not have the problem of backward compatible.

@andrewor14
Copy link
Contributor

Yes, but that by itself maybe a reason why we can't merge this patch.

@watermen
Copy link
Contributor Author

@andrewor14 Can we add new method newWholeTextFiles(path: String, maxPartitions: Int)? and

wholeTextFiles(path: String, minPartitions: Int = defaultMinPartitions) {
  newWholeTextFiles(path, minPartitions)
}

I want to rename strongly because minPartitions will mislead users.

@srowen
Copy link
Member

srowen commented Jun 19, 2015

I don't understand this. The value minPartitions is passed to InputFormat.setMinPartitions. Why do you think it's a max?

@watermen
Copy link
Contributor Author

@srowen wholeTextFiles used to combine small files, and the actual number of partition(task) is less than or equal to partitionsNum user passed.

@srowen
Copy link
Member

srowen commented Jun 19, 2015

The number of partitions is generally equal to the number of files. I don't think it can be less; it can be more. Really, the minPartitions setting rarely does anything; it's just a suggestion anyway. It might cause Hadoop to return multiple splits per file (which would be bad here actually). But it is definitely not a maximum and you can see this is incorrect, as it's passed to setMinPartitions.

I suppose I'd argue that this arg should go away entirely as it seems like it can only hurt. At this point though it exists, and its name and doc is correct relative to what it does.

@andrewor14
Copy link
Contributor

@watermen I think the decision then is to not merge this patch. Whether or not the intended change is correct (and it is likely not, according to @srowen), this breaks public API backward compatibility. Would you mind closing this PR then? Thanks for your work.

@watermen watermen closed this Jun 23, 2015
asfgit pushed a commit that referenced this pull request Jun 23, 2015
This commit exists to close the following pull requests on Github:

Closes #2849 (close requested by 'srowen')
Closes #2786 (close requested by 'andrewor14')
Closes #4678 (close requested by 'JoshRosen')
Closes #5457 (close requested by 'andrewor14')
Closes #3346 (close requested by 'andrewor14')
Closes #6518 (close requested by 'andrewor14')
Closes #5403 (close requested by 'pwendell')
Closes #2110 (close requested by 'srowen')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants