-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
What's the problem this feature will solve?
There's no clear picture in pip's internal code of when a project name is correctly normalised. As a result, code tends to call canonicalize_name() "just in case".
While the cost of the extra calls is small, it also makes it difficult to reason about the logic, and the more difficult it is, the more likely that people will "just call canonicalize_name() to be sure", compounding the issue.
Describe the solution you'd like
A well-documented and clear indication of which parts of the code are responsible for ensuring that project names are in canonical form, so that the rest of pip's code can confidently just use names as provided to it.
Alternative Solutions
It would be nice if we could use type checks to enforce normalisation hygene, but apparently MyPy treats type aliases as identical, so doesn't support this. It's not (IMO) obvious that it's worth the cost of having an actual NormalizedName class. I could be convinced otherwise, but I don't think it's a productive place to start.
Additional context
The issue here is similar in principle to Unicode-safety, and can be viewed in the same way - normalise at the boundaries of the code, and use normalised values exclusively throughout the internal code.