-
Notifications
You must be signed in to change notification settings - Fork 29k
[Minor][ML]Minor correction in the powerIterationSuite #21689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| assert(localAssignments === expectedResult) | ||
|
|
||
| val predictions = Array.fill(2)(mutable.Set.empty[Long]) | ||
| assignments.select("id", "cluster").collect().foreach { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think this was clearer with .as[(Long,Int)] as it avoids matching Row. I don't feel strongly about it; just less change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I have modified.
| assignments.select("id", "cluster").collect().foreach { | ||
| case Row(id: Long, cluster: Integer) => predictions(cluster) += id | ||
| } | ||
| assert(predictions.toSet == Set((0 until n1).toSet, (n1 until n).toSet)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want === here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. done.
| .collect() | ||
|
|
||
| val predictions = Array.fill(2)(mutable.Set.empty[Long]) | ||
| assignments.foreach{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might fail the style checker -- need a space before the brace. Looks good otherwise. Let's see if the test passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the space. Thank you.
|
Test build #4207 has finished for PR 21689 at commit
|
|
Merged to master |
|
Thank you @srowen for merging. |
What changes were proposed in this pull request?
Currently the power iteration clustering test in spark ml, maps the results to the labels 0 and 1 for assertion. Since the clustering outputs need not be the same as the mapped labels, it may cause failure in the test case. Even if it correctly maps, theoretically we cannot guarantee which set belongs to which cluster label. KMeans can assign label 0 to either of the set.
PowerIterationClusteringSuite in the MLLib checks the clustering results without mapping to the particular cluster label, as shown below.
val predictions = Array.fill(2)(mutable.Set.empty[Long]) model.assignments.collect().foreach { a => predictions(a.cluster) += a.id } assert(predictions.toSet == Set((0 until n1).toSet, (n1 until n).toSet))How was this patch tested?
Existing tests