-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Added subroutine to run an NLTK Stemmer #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This is a good feature. In addition to addressing my comments, can you also add tests for this change?
textblob/blob.py
Outdated
| #added 'stemmer' on lines of lemmatizer | ||
| #based on nltk | ||
| def stem(self, isporter=True): | ||
| #param isporter: True when using Porter stemmer, else Snowball |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document this method with a docstring, using the Sphinx syntax (see other methods for examples).
| lemmatizer = nltk.stem.WordNetLemmatizer() | ||
| return lemmatizer.lemmatize(self.string, pos) | ||
|
|
||
| #added 'stemmer' on lines of lemmatizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments are unnecessary; remove them.
textblob/blob.py
Outdated
|
|
||
| #added 'stemmer' on lines of lemmatizer | ||
| #based on nltk | ||
| def stem(self, isporter=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other stemmers available in NLTK? If so, I'm not sure a boolean param will suffice here. Perhaps the user could pass an instance of a stemmer, e.g.
blob.stem(stemmer=PorterStemmer())There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking much better! Just need to fix the test.
Also, there are a lot of unnecessary commits here--would you mind squashing your commits down? If you don't know how or don't have the time, don't worry too much about it.
tests/test_blob.py
Outdated
| w = tb.Word("wolves") | ||
| assert_equal(w.stem(), "wolv") | ||
| w = tb.Word("went") | ||
| assert_equal(w.stem("v"), "went") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is failing because of the invalid "v" argument.
|
thanks man!
i've pushed changes. i also altered SentenceTokenizer() - i wasn't
particularly aiming for that, just hoping to solve an issue - but those
changes are harmless
Regards,
*Nitish Kulshrestha | +91 915 866 4202*
Core Member
Alumni Affairs Division
*BITS Pilani K. K. Birla Goa CampusNH17-B, ZuarinagarGoa- 403726India*
…On Thu, Feb 9, 2017 at 8:45 AM, Steven Loria ***@***.***> wrote:
***@***.**** requested changes on this pull request.
Looking much better! Just need to fix the test.
Also, there are a lot of unnecessary commits here--would you mind
squashing your commits down? If you don't know how or don't have the time,
don't worry too much about it.
------------------------------
In tests/test_blob.py
<#149 (review)>:
> @@ -914,6 +918,14 @@ def test_lemma(self):
w = tb.Word("went", "VBD");
assert_equal(w.lemma, "go")
+ def test_stem(self): #only PorterStemmer tested
+ w = tb.Word("cars")
+ assert_equal(w.stem(), "car")
+ w = tb.Word("wolves")
+ assert_equal(w.stem(), "wolv")
+ w = tb.Word("went")
+ assert_equal(w.stem("v"), "went")
This assertion is failing because of the invalid "v" argument.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#149 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AULAGJrBKcZ8u-uy4mEBrLNR_fLJ3iIzks5raoTpgaJpZM4LztMV>
.
|
|
The tokenization changes are unrelated to this PR; please remove them. |
|
The added test is failing; can you please look into it? |
|
@sloria I have fixed errors in the stemmer, rest seem to be errors in TextBlob itself. Please look into it. |
|
Yes, the errors are unrelated to this PR. Thanks for making those changes. This looks good to go. |
So I saw an issue about TextBlob using a lemmatizer but not a stemmer.
Wrote a function for the later.
@sloria