Skip to content

Commit 1646912

Browse files
committed
whoosh全文索引查找和中文分词
1 parent b375683 commit 1646912

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

index_search_whoosh.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,3 +141,10 @@ title/path/content就是所谓的字段。每个字段对应索引查找目标
141141
{"title":u"my second document","path":u"/a"}
142142

143143
前面已经将上述两个字段设置为stored=True.
144+
145+
##中文分词
146+
147+
中文分词中,结巴分词是不错的。以下两个内容解决中文分析问题:
148+
149+
- [结巴分词](https://github.com/qiwsir/jieba)
150+
- [whoosh and 结巴分词](https://github.com/qiwsir/algorithm/blob/master/chinesetokenizer.py)

0 commit comments

Comments
 (0)