- git clone
git clone [email protected]:evansuner/smartProxyPool.git- Install requirements
pip3 install -r requirements.txt- Update Configuration(option)
setting.py
# API Server Configuration
HOST = "0.0.0.0" # IP
PORT = 9091 # Listen port
# Database Configuration
DB_CONN = 'redis://:[email protected]:8888/0'
# ProxyFetcher Configuration
PROXY_FETCHER = [
"freeProxy01", # Here is the enabled proxy fetching method name,
"freeProxy02", # All fetch methods located in fetcher/proxyFetcher.py
# ....
]- Start project
# If you already have the running conditions, you can start it through server_proxy_pool.py。
# The program is divided into: schedule scheduler and server Api service
# start schedule programme and web api server
python3 server_proxy_pool.py schedule &
python3 server_proxy_pool.py server
docker pull jhao104/proxy_pool
docker run --env DB_CONN=redis://:123456@ip:port/0 -p 9090:9090 jhao104/proxy_pool:latestdocker compose up -d- Api start server, http://127.0.0.1:9090:
| api | method | Description | params |
|---|---|---|---|
| / | GET | api介绍 | None |
| /get | GET | randomly get an agent | option params: ?type=https filter type is https |
| /pop | GET | get and remove an agent | option params: ?type=https filter type is https |
| /all | GET | get all agents | option params: ?type=https filter type is https |
| /count | GET | view counts of agents | None |
| /delete | GET | delete agent | ?proxy=host:ip |
- Using in Spiders
If these proxies are used in spiders, these APIs need to be wrapped into function calls, example:
import requests
def get_proxy():
return requests.get('https://127.0.0.1:9090/get/').json()
def delete_proxy(proxy):
return requests.get(f'https://127.0.0.1:9090/delete?proxy={proxy}')
# your spider code here
def parse():
"""..."""
retry_count = 5
proxy = get_proxy().get('proxy')
while retry_count > 0:
try:
content = requests.get('https://www.example.com', proxies={'http':f'http://{proxy}'})
return content
except TimeoutError:
retry_count -= 1
delete_proxy(proxy)
return NoneAlthough this project provides several free proxy repositories, the quality of free proxies is limited, if the proxies you get from this project are not useful, here provides you a method to add new proxy repos:
- add a new staticmethod in class
ProxyFetcher, this function should use generator(yield) returns a formatted proxy likehost:ip, for example:
class ProxyFetcher:
"""..."""
...
@staticmethod
def freeProxyCustom1():
"""the function name should not duplicate exist"""
proxies = ['x.x.x.x:5000','x.x.x.x:5001']
for proxy in proxies:
yield proxy- update settings config:
PROXY_FETCHER = [
...,
'freeProxyCustom1'
]Finally, schedule process would use your method to scrapy new proxies.