-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3961] [MLlib] [PySpark] Python API for mllib.feature #2819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
QA tests have started for PR 2819 at commit
|
python/pyspark/mllib/feature.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Scala/Java Word2Vec implementation , we used setters to set parameters, should we keep the same interface at python side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to have same interface crossing languages, but sometimes it looks wired to having the API that is designed for Java.
I'd like to simply the Python API a little bit (without introducing confusing), then Python programmer can feel better (in this case). We can find several similar cases in APIs of pyspark.RDD.
Does it make sense?
|
QA tests have started for PR 2819 at commit
|
|
QA tests have finished for PR 2819 at commit
|
|
Test FAILed. |
|
QA tests have finished for PR 2819 at commit
|
|
Test FAILed. |
|
QA tests have started for PR 2819 at commit
|
|
QA tests have finished for PR 2819 at commit
|
|
Test FAILed. |
|
QA tests have started for PR 2819 at commit
|
|
QA tests have finished for PR 2819 at commit
|
|
Test PASSed. |
Conflicts: python/pyspark/mllib/feature.py
|
QA tests have started for PR 2819 at commit
|
|
QA tests have finished for PR 2819 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could remove "(_)"
|
Test PASSed. |
|
@davies LGTM @davies @Ishiihara This debate about whether the Python API should be Pythonic or match the Scala/Java API is tough. @mateiz has recommended the latter (match Scala/Java); it would certainly be good to converge on a community standard! |
|
So regarding the interface, as I mentioned to Joseph, I would like the interfaces to be the same so that people can easily copy code between the languages. Many people will see a Spark example in one language on a slide, and then try to do the same thing in their own program, so we want what to be super simple. So don't remove the getters and setters. In this particular case, it may be okay to support keyword args in addition to the getters / setters, since it will be obvious that there's another way to do that. But we should only do this if we're absolutely certain that these methods will have no required args in the future, because otherwise default and named arguments can mess things out. |
|
BTW we can also leave out the default args for now and add them later, if we want to take more time to decide this. But the Python API should definitely include all the methods in the Scala / Java one. |
|
@mateiz @jkbradley @Ishiihara I had revert the API changes in Word2Vec, also remove the keyword arguments. |
|
Test build #22316 has started for PR 2819 at commit
|
|
Test build #22316 has finished for PR 2819 at commit
|
|
Test PASSed. |
docs/mllib-feature-extraction.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vector is not used
|
Test build #22346 has started for PR 2819 at commit
|
|
Test build #22346 has finished for PR 2819 at commit
|
|
Test PASSed. |
|
LGTM. Merged into master. Thanks! |
Added completed Python API for MLlib.feature
Normalizer
StandardScalerModel
StandardScaler
HashTF
IDFModel
IDF
cc @mengxr