Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
fix: try punkt fallback to punkt_tab in split_sentences (NLTK >= 3.9)
  • Loading branch information
danishashko committed Apr 1, 2026
commit f50ceaf29877fb552f72f5b1108b99e5ddfef3a0
5 changes: 4 additions & 1 deletion newspaper/nlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,10 @@ def split_sentences(text):
"""Split a large string into sentences
"""
import nltk.data
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
try:
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
except LookupError:
tokenizer = nltk.data.load('tokenizers/punkt_tab/english')

sentences = tokenizer.tokenize(text)
sentences = [x.replace('\n', '') for x in sentences if len(x) > 10]
Expand Down