Skip to content

Conversation

@WeichenXu123
Copy link
Contributor

What changes were proposed in this pull request?

Python API for DataFrame-based multivariate summarizer.

How was this patch tested?

doctest added.

@SparkQA
Copy link

SparkQA commented Feb 28, 2018

Test build #87778 has finished for PR 20695 at commit 488d45a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class Summarizer(object):

@SparkQA
Copy link

SparkQA commented Feb 28, 2018

Test build #87782 has finished for PR 20695 at commit 7d3cb1b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2018

Test build #87784 has finished for PR 20695 at commit 001ff46.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

2.4.0?

@SparkQA
Copy link

SparkQA commented Mar 1, 2018

Test build #87816 has finished for PR 20695 at commit b3e9ddd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WeichenXu123
Copy link
Contributor Author

@MrBago @holdenk Thanks!

@WeichenXu123
Copy link
Contributor Author

gentle ping @MrBago @yogeshg Thanks!

Copy link
Contributor

@MrBago MrBago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, otherwise LGTM.

return Summarizer(js)

@since("2.4.0")
def summary(self, featureCol, weightCol=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to move the "summary" method into another class, and have Summary only contain static methods. That will help with autocomplete so that it's clear that you're not meant to do Summery.metrics("min").mean(features).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable.

@SparkQA
Copy link

SparkQA commented Mar 21, 2018

Test build #88463 has finished for PR 20695 at commit e64f795.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SummarizerBuilder(object):

@SparkQA
Copy link

SparkQA commented Apr 11, 2018

Test build #89167 has finished for PR 20695 at commit 21edbcd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@jkbradley jkbradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few comments. Thanks!

return SummarizerBuilder(js)


class SummarizerBuilder(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name needs to match its Scala equivalent: "SummaryBuilder"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't we use JavaWrapper for this? That will clean up when this object is destroyed.

self._js = js

@since("2.4.0")
def summary(self, featureCol, weightCol=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: naming should match Scala: "featuresCol"

def summary(self, featureCol, weightCol=None):
"""
Returns an aggregate object that contains the summary of the column with the requested
metrics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's copy the docs for arguments & return value from Scala

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89331 has finished for PR 20695 at commit 20968c1.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SummaryBuilder(JavaWrapper):

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89333 has finished for PR 20695 at commit b91dbeb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@jkbradley jkbradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updates, just 1 new comment

"""
def __init__(self, js):
self._js = js
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should call the super's init method, and it should store js in _java_obj (which is set in the JavaWrapper init).

@SparkQA
Copy link

SparkQA commented Apr 17, 2018

Test build #89422 has finished for PR 20695 at commit 9a4a0ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
Thanks for the PR!
Merging with master

@asfgit asfgit closed this in 1ca3c50 Apr 17, 2018
@WeichenXu123 WeichenXu123 deleted the py_summarizer branch April 17, 2018 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants