Skip to content

Commit 8b00bd7

Browse files
committed
Ignore training samples shorter than the maximum kernel size
1 parent 0ea6819 commit 8b00bd7

File tree

2 files changed

+15
-1
lines changed

2 files changed

+15
-1
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ Fork of Shawn Ng's [CNNs for Sentence Classification in PyTorch](https://github.
1414
* Weights are represented by upsampling.
1515
* Only supports pre-trained word vectors from TorchText.
1616
* The random_state parameter probably only works with integers or None.
17+
* Training samples shorter than the maximum kernel size are ignored.
18+
* Test samples shorter than the maximum kernel size are classified as the most common class found during training.
1719
* Features my idiosyncratic coding style.
1820

1921
## To Do

cnn_text_classification.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,9 +158,21 @@ def __preprocess(self, X, y, sample_weight):
158158
self.__text_field = Field(lower=True)
159159
self.__label_field = Field(sequential=False)
160160
self.__text_field.preprocessing = Pipeline(self.__preprocess_text)
161+
max_krnl_sz = int(self.kernel_sizes[self.kernel_sizes.rfind(",") + 1:])
162+
X, y = list(X), list(y)
163+
sample_weight = None if sample_weight is None else list(sample_weight)
164+
165+
for i in range(len(X) - 1, -1, -1):
166+
if len(self.__text_field.preprocess(X[i])) < max_krnl_sz:
167+
del X[i]
168+
del y[i]
169+
170+
if sample_weight is not None:
171+
del sample_weight[i]
172+
161173
fields = [("text", self.__text_field), ("label", self.__label_field)]
162-
weights = [1 for yi in y] if sample_weight is None else sample_weight
163174
exmpl = [Example.fromlist([X[i], y[i]], fields) for i in range(len(X))]
175+
weights = [1 for yi in y] if sample_weight is None else sample_weight
164176

165177
if self.class_weight is not None:
166178
cw = self.class_weight

0 commit comments

Comments
 (0)