Fix how 'Location' is parsed from header text#6878
Fix how 'Location' is parsed from header text#6878Hainish merged 4 commits intoEFForg:masterfrom jeremyn:fix-location-header-parsing
Conversation
| if not location: | ||
| location = None | ||
| for piece in headerStr.split('\n'): | ||
| if piece.startswith('Location:'): |
There was a problem hiding this comment.
We should be careful here. Although the Location header is usually capitalized in this way, it is not always. See https://www.shodan.io/search?query=location%3A
There was a problem hiding this comment.
You're right, thanks. Also RFC 7230 says header field names are case-insensitive. This should be easy to fix.
| location = None | ||
| for piece in headerStr.lower().split('\n'): | ||
| if piece.startswith('location:'): | ||
| location = ''.join(piece.split(': ')[1:]) |
There was a problem hiding this comment.
For consistency, you might want to make the argument for the startswith call the same as the argument for the split call. This way, if the web server returns an improperly formatted header (lacking the space) it won't cause an error or unintended behavior.
There was a problem hiding this comment.
Good point again, I'll change it. Actually though it would be properly formatted. From the same RFC (bold mine):
Each header field consists of a case-insensitive field name followed by a colon (":"), optional leading whitespace, the field value, and optional trailing whitespace.
There was a problem hiding this comment.
Ah, interesting - I've never seen headers lacking the leading whitespace. In that case you might want to change just line 383 to be:
location = ''.join(piece.split(':')[1:]).strip()There was a problem hiding this comment.
''.join instead of ':'.join destroys the :. This wasn't noticeable with the much less likely :<space>, but very noticeable with : as in http://. (I made that mistake too.)
| raise HTTPFetcherError("Pycurl fetch failed for '%s'" % newUrl) | ||
| elif httpCode in (301, 302, 303, 307): | ||
| # Parse out the headers and extract location, case-insensitively. | ||
| # If there are multiple location headers, pick the last one. |
There was a problem hiding this comment.
I'm curious, why remove the above comments?
There was a problem hiding this comment.
I distrust comments in general. The code here is six lines long, it doesn't need two lines of comments.
There was a problem hiding this comment.
Fair enough. I differ in opinion on comments, but I don't think it matters in this case.
| location = None | ||
| for piece in headerStr.lower().split('\n'): | ||
| if piece.startswith('location:'): | ||
| location = ':'.join(piece.split(':')[1:]).strip() |
There was a problem hiding this comment.
I just spotted the source of another possible problem. Here, the location will always be a lowercased version of the URL, due to line 381. URLs should be treated as case-sensitive, though.
URLs in general are case-sensitive (with the exception of machine names). There may be URLs, or parts of URLs, where case doesn't matter, but identifying these may not be easy. Users should always consider that URLs are case-sensitive.
There was a problem hiding this comment.
Right again, in fact that comes up in the https://itunes.com example because the Location there is http://www.apple.com/itunes/?cid=OAS-US-DOMAINS-itunes.com . I'll fix it tomorrow before tomorrow.
|
Great, lgtm |
|
Does this mean that this is actually a bug in curl in general? |
|
No, I think the bug was entirely on our side because we failed to parse out the |
|
I was just thinking about #6871 (comment).
|
|
Oh. Of course @J0WI has to confirm but I assume they meant that they saw |
|
Ah, could be. I'm just getting errors altogether; |
|
https://www.itunes.com has an invalid certificate, that's what #6871 is trying to fix. I get something similar with |
|
Looks like it's an iTunes thing: curl/curl#1020 (comment). |
|
I don't know if this helps but when I did a GET on it through Fiddler debugging proxy the server failed a compliance check:
|
|
@jeremyn I get no (.venv)user@personal:~$ curl -I https://www.itunes.com
curl: (51) SSL: no alternative certificate subject name matches target host name 'www.itunes.com'
(.venv)user@personal:~$ curl -v https://www.itunes.com
* Rebuilt URL to: https://www.itunes.com/
* Hostname was NOT found in DNS cache
* Trying 17.172.224.35...
* Connected to www.itunes.com (17.172.224.35) port 443 (#0)
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / AES128-GCM-SHA256
* Server certificate:
* subject: 1.3.6.1.4.1.311.60.2.1.3=US; 1.3.6.1.4.1.311.60.2.1.2=California; businessCategory=Private Organization; serialNumber=C0806592; C=US; postalCode=95014; ST=California; L=Cupertino; street=1 Infinite Loop; O=Apple Inc.; OU=Internet Services; CN=itunes.com
* start date: 2015-08-19 00:00:00 GMT
* expire date: 2017-08-19 23:59:59 GMT
* subjectAltName does not match www.itunes.com
* SSL: no alternative certificate subject name matches target host name 'www.itunes.com'
* Closing connection 0
* SSLv3, TLS alert, Client hello (1):
curl: (51) SSL: no alternative certificate subject name matches target host name 'www.itunes.com'FYI, the error indicates that this host is using SAN to indicate which FQDNs the certificate is valid for, but the names listed in the x509 (itunes.com) didn't match the host name you specified (www.itunes.com). You can verify that with this command: Looking at the |
|
@Hainish https://itunes.com is what redirects and has the |
|
Ah. Yeah in that case the SSL error see is strange. For other https hosts that redirect, |
This fixes the test problem identified in pull request #6871 , where this code apparently (?) never finds the
Locationheader, for example:Note that if in fact
locationhas been incorrectlyFalse-y for a while, then it's possible the code immediately after theraise HTTPFetcherErrorline has not been run recently or ever, and this fix may reveal new bugs in it.Pinging @fuglede @Hainish @J0WI for review.