A Chinese term classifier based on web search results
Files in this repository includes
- Scripts to retrieve features of terms from search engine, cleansing the raw feature lists and sampling the data
- Dictionary used for Chinese text segmentation
- Sample data sets include:
- Input term sets named in
drugList-; - Raw term-feature matrix generated from different search engines and term set named in
drugFeature-; - The exact testing and training sets used for this study named in
-TestTrain.
- Input term sets named in
You will need 7-Zip (http://www.7-zip.org/) to decompress the files.