@@ -25,7 +25,8 @@ Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
2525``gopher ``, ``hdl ``, ``http ``, ``https ``, ``imap ``, ``mailto ``, ``mms ``,
2626``news ``, ``nntp ``, ``prospero ``, ``rsync ``, ``rtsp ``, ``rtspu ``, ``sftp ``,
2727``shttp ``, ``sip ``, ``sips ``, ``snews ``, ``svn ``, ``svn+ssh ``, ``telnet ``,
28- ``wais ``, ``ws ``, ``wss ``.
28+ ``wais ``, ``ws ``, ``wss ``. The behavior of other schemes may be controlled with
29+ a collection of ``UrlClass `` enums passed to dependent functions.
2930
3031The :mod: `urllib.parse ` module defines functions that fall into two broad
3132categories: URL parsing and URL quoting. These are covered in detail in
@@ -37,24 +38,33 @@ URL Parsing
3738The URL parsing functions focus on splitting a URL string into its components,
3839or on combining URL components into a URL string.
3940
40- .. function :: urlparse(urlstring, scheme='', allow_fragments=True)
41+ .. function :: urlparse(urlstring, scheme='', allow_fragments=True, classes=set() )
4142
42- Parse a URL into six components, returning a 6-item :term: ` named tuple `. This
43- corresponds to the general structure of a URL:
44- ``scheme://netloc/path;parameters?query#fragment ``.
45- Each tuple item is a string, possibly empty. The components are not broken up
46- into smaller parts (for example, the network location is a single string), and %
43+ Parse a URL into six components with respect to given scheme classes,
44+ returning a 6-item :term: ` named tuple `. This corresponds to the general
45+ structure of a URL: ``scheme://netloc/path;parameters?query#fragment ``. Each
46+ tuple item is a string, possibly empty. The components are not broken up into
47+ smaller parts (for example, the network location is a single string), and %
4748 escapes are not expanded. The delimiters as shown above are not part of the
48- result, except for a leading slash in the *path * component, which is retained if
49- present. For example:
49+ result, except for a leading slash in the *path * component, which is retained
50+ if present.
51+
52+ The scheme of the URL determines whether or not parameters are parsed as
53+ distinct from the path. To override the scheme and parse parameters anyway,
54+ pass a set containing ``SchemeClass.PARAMS ``.
55+
56+ For example:
5057
5158 .. doctest ::
5259 :options: +NORMALIZE_WHITESPACE
5360
54- >>> from urllib.parse import urlparse
61+ >>> from urllib.parse import urlparse, SchemeClass
5562 >>> urlparse(" scheme://netloc/path;parameters?query#fragment" )
5663 ParseResult(scheme='scheme', netloc='netloc', path='/path;parameters', params='',
5764 query='query', fragment='fragment')
65+ >>> urlparse(" scheme://netloc/path;parameters?query#fragment" , classes = [SchemeClass.PARAMS ])
66+ ParseResult(scheme='scheme', netloc='netloc', path='/path',
67+ params=';parameters', query='query', fragment='fragment')
5868 >>> o = urlparse(" http://docs.python.org:80/3/library/urllib.parse.html?"
5969 ... " highlight=params#url-parsing" )
6070 >>> o
@@ -348,19 +358,21 @@ or on combining URL components into a URL string.
348358 with an empty query; the RFC states that these are equivalent).
349359
350360
351- .. function :: urljoin(base, url, allow_fragments=True)
361+ .. function :: urljoin(base, url, allow_fragments=True, classes=set() )
352362
353363 Construct a full ("absolute") URL by combining a "base URL" (*base *) with
354- another URL (*url *). Informally, this uses components of the base URL, in
355- particular the addressing scheme, the network location and (part of) the
356- path, to provide missing components in the relative URL. For example:
364+ another URL (*url *), and with behavior given by a set of ``SchemeClass ``
365+ enums. Informally, this uses components of the base URL, in particular the
366+ addressing scheme, the network location and (part of) the path, to provide
367+ missing components in the relative URL. For example:
357368
358369 >>> from urllib.parse import urljoin
359370 >>> urljoin(' http://www.cwi.nl/%7E guido/Python.html' , ' FAQ.html' )
360371 'http://www.cwi.nl/%7Eguido/FAQ.html'
361372
362373 The *allow_fragments * argument has the same meaning and default as for
363- :func: `urlparse `.
374+ :func: `urlparse `. As in :func: `urlparse `, a ``SchemeClass `` set may be given
375+ to override behavior inferred by the scheme.
364376
365377 .. note ::
366378
@@ -543,6 +555,53 @@ operating on :class:`bytes` or :class:`bytearray` objects:
543555
544556 .. versionadded :: 3.2
545557
558+ Special URL Behaviors and Scheme Classes
559+ ----------------------------------------
560+
561+ :mod: `urllib.parse ` recognizes three special properties of URLs, namely relative
562+ addressing (used in, for instance, the ``ftp ``, ``http ``, or ``gopher ``
563+ protocols), netloc-sensitive resolution (used in the ``ftp ``, ``http ``, or
564+ ``git `` protocols), and URLs that may contain parameters (for instance, ``ftp ``
565+ or ``telnet ``).
566+
567+ Relative addressing allows resolution of relative URLs, and netloc-sensitive
568+ addressing allows resolution with respect to the netloc (domain name) of a URL.
569+ As HTTP URLs have both behaviors by default, this is demonstrated in the
570+ following example:
571+
572+ >>> from urllib.parse import urljoin
573+ >>> urljoin(' http://example.org/post/x' , ' ../y' )
574+ 'http://example.org/post/y'
575+
576+ Additionally, if it is not indicated that a URL is sensitive to parameters
577+ (those specified after a semicolon in the path), then they'll be treated as part
578+ of the path rather than as a distinct component.
579+
580+ Without specifying optional parameters or modifying global variables, Python
581+ will guess what parameters to apply based on the scheme. Schemes associated with
582+ each are specified by three lists in :mod: `urllib.parse `:
583+
584+ * ``urllib.uses_relative ``
585+ * ``urllib.uses_netloc ``
586+ * ``urllib.uses_params ``
587+
588+ In addition, any function that takes a ``classes `` parameter (for instance,
589+ :func: `urlparse ` and :func: `urljoin `) may override the behavior of the uses
590+ lists, for instance, parsing a custom or widely unused scheme with the same
591+ behavior as that of HTTP:
592+
593+ >>> from urllib.parse import urljoin, SchemeClass
594+ >>> urljoin(
595+ 'my-protocol://example.org/post/x', '../y',
596+ classes=[SchemeClass.NETLOC, SchemeClass.RELATIVE])
597+ 'http://example.org/post/y'
598+
599+ For reference, the following three scheme classes are present (exactly
600+ corresponding to the uses lists):
601+
602+ * ``urllib.SchemeClass.RELATIVE ``
603+ * ``urllib.SchemeClass.NETLOC ``
604+ * ``urllib.SchemeClass.PARAMS ``
546605
547606URL Quoting
548607-----------
0 commit comments