This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author onlynone
Recipients
Date 2006-03-23.20:49:08
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
urllib.splithost(url) requires that the url passed in
be of the form '//host[:port]/path'. Yet I've run
across some urls that are of the form
'//host[:port]?querystring'. This causes splithost to
return everything as the host and nothing as the path.


Section 3.2 of rfc2396 (Uniform Resource Identifiers:
Generic Syntax) states that 'The authority component is
preceded by a double slash "//" and is terminated by
the next slash "/", question-mark "?", or by the end of
the URI.'

Also, this is how it defines a URI:

absoluteURI   = scheme ":" ( hier_part | opaque_part )
hier_part     = ( net_path | abs_path ) [ "?" query ]
net_path      = "//" authority [ abs_path ]
abs_path      = "/"  path_segments

Based on the above, you could certainly have:
'http://authority?query' as a valid url.


In python2.3 you would just need to change line 939 in
urllib.py from:

        _hostprog = re.compile('^//([^/]*)(.*)$')

to:

        _hostprog = re.compile('^//([^/?]*)(.*)$')

This appears to affect all python versions, I just
happened to be using 2.3.
History
Date User Action Args
2007-08-23 14:38:42adminlinkissue1457264 messages
2007-08-23 14:38:42admincreate