tiwariayush / python-web-extractor Public

Scrapes data from websites that do load their data by javascript .

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.extract		.extract
LICENSE		LICENSE
README.md		README.md
extract.py		extract.py
test.py		test.py
webkit_product_info.py		webkit_product_info.py
webpage_xpath.csv		webpage_xpath.csv

Repository files navigation

python-web-extractor

Scrapes data from websites that do load their data by javascript . Till now ,it extracts from two online marketting websites - amazon and flipkart .

#Contributors

#Dependencies

Python
Webscraping(http://docs.webscraping.com/index.html)
csv
For webkit_extractor , you need to install PyQt4 sudo apt-get install python-pyqt4

#Installation/Usage

Fork and clone the repository.
Move the files extract.py and webpage_xpath to your app directory .
Pass the function extract(url) to get the data and use as per your wish.
For extracting more data from other websites , just add the xpaths in webpage_xpath.csv .

#License

python-web-extractor is licensed under the [MIT license.]