Skip to content

Conversation

@fsmeraldi
Copy link

@fsmeraldi fsmeraldi commented Feb 24, 2025

Removal of scrapy.utils.request.request_fingerprint() breaks scrapy_deltafetch. I solved this by replacing the deprecated function with a RequestFingerprinter object according to the new specifications. Tests modified accordingly.

Thank you for a very useful package!

Closes #50.

@Gallaecio
Copy link
Contributor

While this indeed removes the removed import, it does not switch to the new way to handle request fingerprinting introduced in Scrapy 2.7.0. And to be fair, I have just realized how the Scrapy docs focus on the user information about request fingerprinting and neglect the component-author information.

Do you think you could refactor this PR to instead rely on self.crawler.request_fingerprinter.fingerprint() for request fingerprinting?

You could use hasattr(self.crawler, "request_fingerprinter") in from_crawler or __init__ to determine whether or not the installed Scrapy version supports it, use it where available, and import the old function where not available.

Test expectations may need to change as well when running a version of Scrapy that supports the new approach. For one, the new fingerprints as bytes, not str.

@fsmeraldi
Copy link
Author

fsmeraldi commented Feb 25, 2025

While this indeed removes the removed import, it does not switch to the new way to handle request fingerprinting introduced in Scrapy 2.7.0. And to be fair, I have just realized how the Scrapy docs focus on the user information about request fingerprinting and neglect the component-author information.

Do you think you could refactor this PR to instead rely on self.crawler.request_fingerprinter.fingerprint() for request fingerprinting?

You could use hasattr(self.crawler, "request_fingerprinter") in from_crawler or init to determine whether or not the installed Scrapy version supports it, use it where available, and import the old function where not available.

I am actually passing the crawler to the constructor of RequestFingerprinter, my understanding is that the constructor does just that version check and handles REQUEST_FINGERPRINTER_CLASS? I also thought this was the current way of doing it, sorry, I might have got in beyond my depth.

Sorry, I got that mixed up with REQUEST_FINGERPRINTER_IMPLEMENTATION. I think I can see what you mean, I will try to give it a go when I have time.

Test expectations may need to change as well when running a version of Scrapy that supports the new approach. For one, the new fingerprints as bytes, not str.

I see the current implementation of to_bytes returns the argument unchanged if it is already a bytes object, so although the code does not check for correctness of the fingerprinting function, it does seem to check correctly for the functioning of deltafetch

@Gallaecio
Copy link
Contributor

Thanks!

I have moved the code to from_crawler to maximize backward compatibility (so that subclasses that do not pass crawler to super().__init__() not only do not break, but also use the new method unless _get_key is overridden).

I have also upgraded the minimum required Python version to get the CI passing. I will open a separate PR to modernize the code base a bit in preparation for a release.

@Gallaecio Gallaecio requested review from kmike and wRAR February 25, 2025 20:03
@fsmeraldi
Copy link
Author

That's neat, thank you very much!

@Gallaecio Gallaecio mentioned this pull request Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants