File tree Expand file tree Collapse file tree 1 file changed +21
-0
lines changed Expand file tree Collapse file tree 1 file changed +21
-0
lines changed Original file line number Diff line number Diff line change @@ -57,6 +57,27 @@ To increase the global concurrency use::
5757
5858 CONCURRENT_REQUESTS = 100
5959
60+ Increase Twisted IO thread pool maximum size
61+ ============================================
62+
63+ Currently Scrapy does DNS resolution in a blocking way with usage of thread
64+ pool. With higher concurrency levels the crawling could be slow or even fail
65+ hitting DNS resolver timeouts. Possible solution to increase the number of
66+ threads handling DNS queries. The DNS queue will be processed faster speeding
67+ up establishing of connection and crawling overall.
68+
69+ To increase maximum thread pool size use::
70+
71+ REACTOR_THREADPOOL_MAXSIZE = 20
72+
73+ Setup your own DNS
74+ ==================
75+
76+ If you have multiple crawling processes and single central DNS, it can act
77+ like DoS attack on the DNS server resulting to slow down of entire network or
78+ even blocking your machines. To avoid this setup your own DNS server with
79+ local cache and upstream to some large DNS like OpenDNS or Verizon.
80+
6081Reduce log level
6182================
6283
You can’t perform that action at this time.
0 commit comments