Skip to content

Conversation

@shahidki31
Copy link
Contributor

@shahidki31 shahidki31 commented Jul 1, 2018

What changes were proposed in this pull request?

Currently the power iteration clustering test in spark ml, maps the results to the labels 0 and 1 for assertion. Since the clustering outputs need not be the same as the mapped labels, it may cause failure in the test case. Even if it correctly maps, theoretically we cannot guarantee which set belongs to which cluster label. KMeans can assign label 0 to either of the set.

PowerIterationClusteringSuite in the MLLib checks the clustering results without mapping to the particular cluster label, as shown below.
val predictions = Array.fill(2)(mutable.Set.empty[Long]) model.assignments.collect().foreach { a => predictions(a.cluster) += a.id } assert(predictions.toSet == Set((0 until n1).toSet, (n1 until n).toSet))

How was this patch tested?

Existing tests

@shahidki31 shahidki31 changed the title Minor correction in the powerIterationSuite [Minor][ML]Minor correction in the powerIterationSuite Jul 1, 2018
assert(localAssignments === expectedResult)

val predictions = Array.fill(2)(mutable.Set.empty[Long])
assignments.select("id", "cluster").collect().foreach {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this was clearer with .as[(Long,Int)] as it avoids matching Row. I don't feel strongly about it; just less change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I have modified.

assignments.select("id", "cluster").collect().foreach {
case Row(id: Long, cluster: Integer) => predictions(cluster) += id
}
assert(predictions.toSet == Set((0 until n1).toSet, (n1 until n).toSet))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want === here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. done.

.collect()

val predictions = Array.fill(2)(mutable.Set.empty[Long])
assignments.foreach{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might fail the style checker -- need a space before the brace. Looks good otherwise. Let's see if the test passes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the space. Thank you.

@SparkQA
Copy link

SparkQA commented Jul 4, 2018

Test build #4207 has finished for PR 21689 at commit b02cae5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jul 4, 2018

Merged to master

@asfgit asfgit closed this in ca8243f Jul 4, 2018
@shahidki31
Copy link
Contributor Author

Thank you @srowen for merging.

@shahidki31 shahidki31 deleted the picTestSuiteMinorCorrection branch July 4, 2018 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants