分析爬取的过程:
[root@CentOS ~]# git clone https://github.com/wangy8961/python3-concurrency-pics-02.git
[root@CentOS ~]# cd python3-concurrency-pics-02/如果你的操作系统是Linux:
[root@CentOS python3-concurrency-pics-02]# python3 -m venv venv3
[root@CentOS python3-concurrency-pics-02]# source venv3/bin/activate
Windows激活虚拟环境的命令是:venv3\Scripts\activate
如果你的操作系统是Linux:
(venv3) [root@CentOS python3-concurrency-pics-02]# pip install -r requirements-linux.txt如果你的操作系统是Windows(不会使用uvloop):
(venv3) C:\Users\wangy> pip install -r requirements-win32.txt由于图片有13万多张,所以测试的时候,你可以指定只下载100个图集来对比同步下载、多线程下载和异步下载的效率区别,修改以下三个脚本中的TEST_NUM = 100
建议每次测试完,都删除相关目录:
(venv3) [root@CentOS python3-concurrency-pics-02]# rm -rf downloads/ logs/ __pycache__/删除数据库记录:
(venv3) [root@CentOS python3-concurrency-pics-02]# mongo
MongoDB shell version v3.6.6
connecting to: mongodb://127.0.0.1:27017
...
> show dbs
admin 0.000GB
config 0.000GB
local 0.000GB
mzitu 0.036GB
> use mzitu
switched to db mzitu
> db.dropDatabase()
{ "dropped" : "mzitu", "ok" : 1 }
> show dbs
admin 0.000GB
config 0.000GB
local 0.000GB
> (venv3) [root@CentOS python3-concurrency-pics-02]# python sequential.py(venv3) [root@CentOS python3-concurrency-pics-02]# python threadpool.py(venv3) [root@CentOS python3-concurrency-pics-02]# python asynchronous.py- Python3爬虫系列01 (理论) - I/O Models 阻塞 非阻塞 同步 异步
- Python3爬虫系列02 (理论) - Python并发编程
- Python3爬虫系列06 (理论) - 可迭代对象、迭代器、生成器
- Python3爬虫系列07 (理论) - 协程
- Python3爬虫系列08 (理论) - 使用asyncio模块实现并发



