-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-10027] [ML] [PySpark] Add Python API missing methods for ml.feature #8313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #41248 has finished for PR 8313 at commit
|
|
Exposing labels from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other classes has no shared accessor class (e.g. StandardScaler, RegexTokenizer) corresponding to arbitrary properties. It might be better to keep handleInvalid only inside of StringIndexer or create other shared accessor classes for StandardScaler etc to standardize the way of accessing to arbitrary properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handleInvalid is common property and other Transformer/Estimators may use it in future, so I think we need to make it shared param. Another reason is that we want to make Python consistency with Scala, and handleInvalid in Scala is shared param.
@jkbradley
|
Maybe add a test for numFeatures & categoryMaps? |
|
@holdenk Agree, done. |
|
Test build #42183 has finished for PR 8313 at commit
|
|
Test build #42199 has finished for PR 8313 at commit
|
|
Test build #42201 has finished for PR 8313 at commit
|
Missing method of ml.feature are listed here:
StringIndexerlacks of parameterhandleInvalid.StringIndexerModellacks of methodlabels.VectorIndexerModellacks of methodsnumFeaturesandcategoryMaps.