Skip to content

Conversation

@vinnymeller
Copy link

Currently this plugin is broken on Scrapy 2.12

scrapy.utils.request.request_fingerprint was removed in Scrapy 2.12 (https://docs.scrapy.org/en/2.12/news.html#deprecation-removals)

If accepted, a version bump would be helpful 🙂

strategy:
matrix:
python-version: [3.5, 3.6, 3.7, 3.8, 3.9]
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11", "3.12", "3.13"]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.5 and 3.6 don't seem to work anymore? at least in my fork

from scrapy.http import Request
from scrapy.item import Item
from scrapy.utils.request import request_fingerprint
from scrapy.utils.request import fingerprint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Scrapy 2.7+, the right approach is using Crawler.fingerprinter.fingerprint().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gallaecio wouldn't using this require storing an instance of (and instantiating) a Crawler object as an instance variable?

from the messages in scrapy 2.11.2 (https://github.com/scrapy/scrapy/blob/e8cb5a03b382b98f2c8945355076390f708b918d/scrapy/utils/request.py#L86-L136) it seems to suggest getting the crawler during instantiation with DeltaFetch.from_crawler if I am reading it right.

but what if the DeltaFetch object isn't instantiated that way? what should happen in DeltaFetch._get_key?

Copy link
Contributor

@Gallaecio Gallaecio Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeltaFetch is a spider middleware, instantiated by Scrapy.

Most Scrapy components are instantiated with the create_instance (Scrapy 2.11-) / build_from_crawler (Scrapy 2.12+) functions, which call from_crawler if defined. Spider middlewares are definitely one of those components. So __init__ should never be called without first calling from_crawler.

It does happen in tests, that use self.mwcls(), and would need to change to either use from_crawler or, better yet, use create_instance / build_from_crawler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants