Skip to content

gh-137146: Restrict IPvFuture address parsing to RFC 3986-valid characters#137147

Open
mauricelambert wants to merge 6 commits intopython:mainfrom
mauricelambert:fix/mauricelambert/urllib.parse/IPvFuture_regex
Open

gh-137146: Restrict IPvFuture address parsing to RFC 3986-valid characters#137147
mauricelambert wants to merge 6 commits intopython:mainfrom
mauricelambert:fix/mauricelambert/urllib.parse/IPvFuture_regex

Conversation

@mauricelambert
Copy link
Copy Markdown
Contributor

@mauricelambert mauricelambert commented Jul 27, 2025

This PR fixes overly permissive validation of IPvFuture hostnames in urllib.parse (#137146).

Previously, the regex used to match IPvFuture (v...) components allowed all characters (.+), which is incorrect. According to RFC 3986 §3.2.2, an IPvFuture should match the following structure:

"v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )

Where:

  • unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
  • sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

This patch replaces the permissive regex with one that strictly enforces this allowed character set.

Before the fix:

>>> import urllib.parse
>>> urllib.parse.urlparse("http://[v45.test|bad]/path")
ParseResult(scheme='http', netloc='[v45.test|bad]', path='/path', ...)

After the fix:

>>> urllib.parse.urlparse("http://[v45.test|bad]/path")
ValueError: IPvFuture address is invalid

This improves standards compliance and helps prevent silent acceptance of malformed or unsafe host components.

…acters

IPvFuture hostnames in URLs were being matched using a too-permissive regex
(`.+`), which allowed invalid characters not defined by RFC 3986.
This patch updates the pattern to only accept characters explicitly allowed
by the RFC for IPvFuture addresses.

According to RFC 3986 §3.2.2, the format of IPvFuture is:

  "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )

Where:
  - unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
  - sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

Before the fix:

>>> import urllib.parse
>>> urllib.parse.urlparse("http://[v45.test|test]/path")
ParseResult(scheme='http', netloc='[v45.test|test]', path='/path', ...)

Invalid characters such as `|` were incorrectly accepted.

After the fix:

>>> import urllib.parse
>>> urllib.parse.urlparse("http://[v45.test|test]/path")
Traceback (most recent call last):
    ...
ValueError: IPvFuture address is invalid

This improves standards compliance and prevents malformed URLs from being
silently accepted.
@StanFromIreland StanFromIreland changed the title #137146: Restrict IPvFuture address parsing to RFC 3986-valid characters gh-137146: Restrict IPvFuture address parsing to RFC 3986-valid characters Jul 27, 2025
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions Bot added the stale Stale PR or inactive for long period of time. label Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review stale Stale PR or inactive for long period of time.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant