Skip to content

Conversation

@mgaido91
Copy link
Contributor

@mgaido91 mgaido91 commented Mar 2, 2018

What changes were proposed in this pull request?

Silhouette need to know the number of features. This was taken using first and checking the size of the vector. Despite this works fine, if the number of attributes is present in metadata, we can avoid to trigger a job for this and use the metadata value. This can help improving performances of course.

How was this patch tested?

existing UTs + added UT

@SparkQA
Copy link

SparkQA commented Mar 2, 2018

Test build #87895 has finished for PR 20719 at commit 73db3b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2018

Test build #88151 has finished for PR 20719 at commit 2d64a90.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

@holdenk @sethah @srowen @viirya may you please help reviewing this PR if you have time? Thanks.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so familiar with this, but it looks logical


protected def getNumberOfFeatures(dataFrame: DataFrame, columnName: String): Int = {
val group = AttributeGroup.fromStructField(dataFrame.schema(columnName))
group.numAttributes.getOrElse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use size of AttributeGroup to determine its number of attributes. When it is -1, we can leverage first then.

@SparkQA
Copy link

SparkQA commented Mar 19, 2018

Test build #88375 has finished for PR 20719 at commit 7d284cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 21, 2018

Merged to master

@asfgit asfgit closed this in 500b21c Mar 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants