Skip to content

Commit b794cda

Browse files
committed
Broad crawls notes.
1 parent e7b274e commit b794cda

File tree

1 file changed

+21
-0
lines changed

1 file changed

+21
-0
lines changed

docs/topics/broad-crawls.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,27 @@ To increase the global concurrency use::
5757

5858
CONCURRENT_REQUESTS = 100
5959

60+
Increase Twisted IO thread pool maximum size
61+
============================================
62+
63+
Currently Scrapy does DNS resolution in a blocking way with usage of thread
64+
pool. With higher concurrency levels the crawling could be slow or even fail
65+
hitting DNS resolver timeouts. Possible solution to increase the number of
66+
threads handling DNS queries. The DNS queue will be processed faster speeding
67+
up establishing of connection and crawling overall.
68+
69+
To increase maximum thread pool size use::
70+
71+
REACTOR_THREADPOOL_MAXSIZE = 20
72+
73+
Setup your own DNS
74+
==================
75+
76+
If you have multiple crawling processes and single central DNS, it can act
77+
like DoS attack on the DNS server resulting to slow down of entire network or
78+
even blocking your machines. To avoid this setup your own DNS server with
79+
local cache and upstream to some large DNS like OpenDNS or Verizon.
80+
6081
Reduce log level
6182
================
6283

0 commit comments

Comments
 (0)