Crawls ria.ru with scrapy
pip install -r requirements.txt- Set up
splashinstance, e.g.docker run -p 8050:8050 scrapinghub/splash scrapy crawl article_spider
- Crawl data
- run stemmer.py
- Run stemmer on some data
- Run search.py
- Enter search query
- A list of article titles will be returned sorted by total term count