Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
_setDefault uses typeConverter
  • Loading branch information
sethah committed Apr 14, 2016
commit 0c0fc63411d13a02b881c4218889a7c8a9bd1866
2 changes: 1 addition & 1 deletion python/pyspark/ml/feature.py
Original file line number Diff line number Diff line change
Expand Up @@ -1721,7 +1721,7 @@ def __init__(self, inputCol=None, outputCol=None, stopWords=None,
self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.StopWordsRemover",
self.uid)
stopWordsObj = _jvm().org.apache.spark.ml.feature.StopWords
defaultStopWords = stopWordsObj.English()
defaultStopWords = list(stopWordsObj.English())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change to _setDefault, I had to change this default to be a list instead of JavaObject. The other option would be to have type converters do nothing if they encounter JavaObjects. It is nice to leave stop words as a JavaObject if they are never accessed explicitly on the Python side. Would appreciate thoughts on this problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we ever explicitly access them on the Python side - although a users application might attempt to do that and append stop words to the existing list in which case having it as a list is maybe good. One could get a similar effect by changing getStopWords without having to round trip the list in cases where it isn't ever accessed on the python side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is simple to make this change, so I think it's a good idea. This will help in the future for similar cases or if the list of stopwords grows even larger. I changed getStopWords to return a list always which is better for users, I think. Thanks for the suggestion!

self._setDefault(stopWords=defaultStopWords, caseSensitive=False)
kwargs = self.__init__._input_kwargs
self.setParams(**kwargs)
Expand Down
9 changes: 8 additions & 1 deletion python/pyspark/ml/param/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,14 @@ def _setDefault(self, **kwargs):
Sets default params.
"""
for param, value in kwargs.items():
self._defaultParamMap[getattr(self, param)] = value
p = getattr(self, param)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a previous PR a parameter was given an incorrect type converter, and this was not caught by the tests. Enforcing _setDefault to use the type converter for the param will ensure that all params with default values cannot be given incompatible type converters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

if value is not None:
try:
value = p.typeConverter(value)
except TypeError as e:
raise TypeError('Invalid default param value given for param "%s". %s'
% (p.name, e))
self._defaultParamMap[p] = value
return self

def _copyValues(self, to, extra=None):
Expand Down