Skip to content

Conversation

@jrbarron
Copy link
Contributor

@jrbarron jrbarron commented Sep 7, 2016

In some cases, it may be useful to allow people to specify their own tokenizer in the same way as the sanitizer. The tokenizer can not be added to the "constructor" function since that would break API compatibility, so add a public setter instead.

The primary use case for this is to allow people to pre-process the tokens, for example, by running the tokens through a stemmer. Without support for a custom tokenizer, the only way to accomplish this is to split, stem and re-join the sentence only to be split again which is suboptimal.

In some cases, it may be useful to allow people to specify their own
tokenizer in the same way as the sanitizer. The tokenizer can not be
added to the "constructor" function since that would break API
compatibility, so add a public setter instead.
@cdipaolo cdipaolo merged commit 4b2f5a3 into cdipaolo:master Sep 17, 2016
@cdipaolo
Copy link
Owner

Hey thanks a bunch!!! And sorry this took so long to get integrated my notification got buried under a bunch of other emails 😨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants