Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Oct 16, 2014

Added completed Python API for MLlib.feature

Normalizer
StandardScalerModel
StandardScaler
HashTF
IDFModel
IDF

cc @mengxr

@SparkQA
Copy link

SparkQA commented Oct 16, 2014

QA tests have started for PR 2819 at commit 8a50584.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Scala/Java Word2Vec implementation , we used setters to set parameters, should we keep the same interface at python side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to have same interface crossing languages, but sometimes it looks wired to having the API that is designed for Java.

I'd like to simply the Python API a little bit (without introducing confusing), then Python programmer can feel better (in this case). We can find several similar cases in APIs of pyspark.RDD.

Does it make sense?

@SparkQA
Copy link

SparkQA commented Oct 16, 2014

QA tests have started for PR 2819 at commit 486795f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 16, 2014

QA tests have finished for PR 2819 at commit 8a50584.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class VectorTransformer(object):
    • class Normalizer(VectorTransformer):
    • class JavaModelWrapper(VectorTransformer):
    • class StandardScalerModel(JavaModelWrapper):
    • class StandardScaler(object):
    • class HashTF(object):
    • class IDFModel(JavaModelWrapper):
    • class IDF(object):
    • class Word2VecModel(JavaModelWrapper):

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21784/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 16, 2014

QA tests have finished for PR 2819 at commit 486795f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class VectorTransformer(object):
    • class Normalizer(VectorTransformer):
    • class JavaModelWrapper(VectorTransformer):
    • class StandardScalerModel(JavaModelWrapper):
    • class StandardScaler(object):
    • class HashingTF(object):
    • class IDFModel(JavaModelWrapper):
    • class IDF(object):
    • class Word2VecModel(JavaModelWrapper):

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21785/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 16, 2014

QA tests have started for PR 2819 at commit 7a1891a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 16, 2014

QA tests have finished for PR 2819 at commit 7a1891a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class VectorTransformer(object):
    • class Normalizer(VectorTransformer):
    • class JavaModelWrapper(VectorTransformer):
    • class StandardScalerModel(JavaModelWrapper):
    • class StandardScaler(object):
    • class HashingTF(object):
    • class IDFModel(JavaModelWrapper):
    • class IDF(object):
    • class Word2VecModel(JavaModelWrapper):

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21789/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 16, 2014

QA tests have started for PR 2819 at commit a405ae7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 16, 2014

QA tests have finished for PR 2819 at commit a405ae7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class VectorTransformer(object):
    • class Normalizer(VectorTransformer):
    • class JavaModelWrapper(VectorTransformer):
    • class StandardScalerModel(JavaModelWrapper):
    • class StandardScaler(object):
    • class HashingTF(object):
    • class IDFModel(JavaModelWrapper):
    • class IDF(object):
    • class Word2VecModel(JavaModelWrapper):

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21790/
Test PASSed.

@davies davies changed the title [SPARK-3961] Python API for mllib.feature [SPARK-3961] [MLlib] [PySpark] Python API for mllib.feature Oct 16, 2014
@SparkQA
Copy link

SparkQA commented Oct 17, 2014

QA tests have started for PR 2819 at commit 59781b9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 17, 2014

QA tests have finished for PR 2819 at commit 59781b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class VectorTransformer(object):
    • class Normalizer(VectorTransformer):
    • class JavaModelWrapper(VectorTransformer):
    • class StandardScalerModel(JavaModelWrapper):
    • class StandardScaler(object):
    • class HashingTF(object):
    • class IDFModel(JavaModelWrapper):
    • class IDF(object):
    • class Word2VecModel(JavaModelWrapper):

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21850/
Test PASSed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could remove "(_)"

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22308/
Test PASSed.

@jkbradley
Copy link
Member

@davies LGTM

@davies @Ishiihara This debate about whether the Python API should be Pythonic or match the Scala/Java API is tough. @mateiz has recommended the latter (match Scala/Java); it would certainly be good to converge on a community standard!

@mateiz
Copy link
Contributor

mateiz commented Oct 28, 2014

So regarding the interface, as I mentioned to Joseph, I would like the interfaces to be the same so that people can easily copy code between the languages. Many people will see a Spark example in one language on a slide, and then try to do the same thing in their own program, so we want what to be super simple. So don't remove the getters and setters. In this particular case, it may be okay to support keyword args in addition to the getters / setters, since it will be obvious that there's another way to do that. But we should only do this if we're absolutely certain that these methods will have no required args in the future, because otherwise default and named arguments can mess things out.

@mateiz
Copy link
Contributor

mateiz commented Oct 28, 2014

BTW we can also leave out the default args for now and add them later, if we want to take more time to decide this. But the Python API should definitely include all the methods in the Scala / Java one.

@davies
Copy link
Contributor Author

davies commented Oct 28, 2014

@mateiz @jkbradley @Ishiihara I had revert the API changes in Word2Vec, also remove the keyword arguments.

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22316 has started for PR 2819 at commit b628693.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22316 has finished for PR 2819 at commit b628693.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class VectorTransformer(object):
    • class Normalizer(VectorTransformer):
    • class JavaModelWrapper(VectorTransformer):
    • class StandardScalerModel(JavaModelWrapper):
    • class StandardScaler(object):
    • class HashingTF(object):
    • class IDFModel(JavaModelWrapper):
    • class IDF(object):
    • class Word2VecModel(JavaModelWrapper):

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22316/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector is not used

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22346 has started for PR 2819 at commit 4f48f48.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22346 has finished for PR 2819 at commit 4f48f48.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class VectorTransformer(object):
    • class Normalizer(VectorTransformer):
    • class JavaModelWrapper(VectorTransformer):
    • class StandardScalerModel(JavaModelWrapper):
    • class StandardScaler(object):
    • class HashingTF(object):
    • class IDFModel(JavaModelWrapper):
    • class IDF(object):
    • class Word2VecModel(JavaModelWrapper):

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22346/
Test PASSed.

@mengxr
Copy link
Contributor

mengxr commented Oct 28, 2014

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in fae095b Oct 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants