Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jan 22, 2015

If a point is selected as new centers for many runs, it would collect many redundant data. This pr refactors it.

@viirya viirya changed the title Refactor KMeans to reduce redundant data [SPARK-5365][MLlib] Refactor KMeans to reduce redundant data Jan 22, 2015
@srowen
Copy link
Member

srowen commented Jan 22, 2015

So this returns (p, (r1, r2, r3, ...)) instead of (r1, p), (r2, p), (r3, p), ... Makes sense to me, especially if you have reason to believe this is a bottleneck somewhere.

@viirya
Copy link
Member Author

viirya commented Jan 22, 2015

Especially when there are many runs to use and p is also high dimensional and selected in more than one run. Then collecting redundant p would be too useless and time-consuming.

@SparkQA
Copy link

SparkQA commented Jan 22, 2015

Test build #25962 has finished for PR 4159 at commit 25487e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Jan 22, 2015

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 246111d Jan 22, 2015
@viirya viirya deleted the small_refactor_kmeans branch December 27, 2023 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants