Skip to content

Ecoindex_scraper module provides a way to scrape data from given website while simulating a real web browser

License

Notifications You must be signed in to change notification settings

alpha14/ecoindex_scrap_python

Repository files navigation

ECOINDEX SCRAPER PYTHON

Quality check PyPI version

This module provides a simple interface to get the Ecoindex of a given webpage using module ecoindex-python

Requirements

  • Python ^3.10 with pip
  • Google Chrome installed on your computer

Install

pip install ecoindex-scraper

Use

Get a page analysis

You can run a page analysis by calling the function get_page_analysis():

(function) get_page_analysis: (url: HttpUrl, window_size: WindowSize | None = WindowSize(width=1920, height=1080), wait_before_scroll: int | None = 1, wait_after_scroll: int | None = 1) -> Coroutine[Any, Any, Result]

Example:

import asyncio
from pprint import pprint

from ecoindex_scraper.scrap import EcoindexScraper

pprint(
    asyncio.run(
        EcoindexScraper(url="http://ecoindex.fr")
        .init_chromedriver()
        .get_page_analysis()
    )
)

Result example:

Result(width=1920, height=1080, url=HttpUrl('http://ecoindex.fr', ), size=549.253, nodes=52, requests=12, grade='A', score=90.0, ges=1.2, water=1.8, ecoindex_version='5.0.0', date=datetime.datetime(2022, 9, 12, 10, 54, 46, 773443), page_type=None)

Default behaviour: By default, the page analysis simulates:

  • Window size of 1920x1080 pixels (can be set with parameter window_size)
  • Wait for 1 second when page is loaded (can be set with parameter wait_before_scroll)
  • Scroll to the bottom of the page (if it is possible)
  • Wait for 1 second after having scrolled to the bottom of the page (can be set with parameter wait_after_scroll)

Get a page analysis and generate a screenshot

It is possible to generate a screenshot of the analyzed page by adding a ScreenShot property to the EcoindexScraper object. You have to define an id (can be a string, but it is recommended to use a unique id) and a path to the screenshot file (if the folder does not exist, it will be created).

import asyncio
from pprint import pprint
from uuid import uuid1

from ecoindex_scraper.models import ScreenShot
from ecoindex_scraper.scrap import EcoindexScraper

pprint(
    asyncio.run(
        EcoindexScraper(
            url="http://www.ecoindex.fr/",
            screenshot=ScreenShot(id=str(uuid1()), folder="./screenshots"),
        )
        .init_chromedriver()
        .get_page_analysis()
    )
)

Contribute

You need poetry to install and manage dependencies. Once poetry installed, run :

poetry install

Tests

poetry run pytest

About

Ecoindex_scraper module provides a way to scrape data from given website while simulating a real web browser

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages