patch_parse: fix parsing unquoted filenames with spaces#4285
Conversation
|
Well, the first filename part could end with a space and be under a subpath starting with |
|
Yeah, correct, that's the actual issue. The issue becomes even more obvious when considering that it is possible to create custom prefixes. So instead of "a/" and "b/", the user can have "foobar/" and "whateverhelikes/" as prefixes, making it totally impossible for us to parse in all cases. One solution pointed out by @ethomson was to instead parse the "--- a/file\n+++ b/file" lines, but unfortunately they are not mandated by the patch format, either. |
6c27418 to
99506a4
Compare
|
This PR now depends on #4308. I've changed the code to follow @ethomson's proposal of just bailing on the "diff --git" line if it seems to be unparseable and instead just use the "---" and "+++" lines. What I'm still undecided about is whether we want to fill in |
Yeah, I don't love my proposal of moving to the Would your opinion about this behavior be different if I sent a patch to git to always quote filenames with spaces in them, ending this ambiguity once and for all? (I should do that either way, of course.) |
|
I definitly think this should be fixed upstream, it's simply awful right now. But that won't help users who are stuck with older versions of Git in the next 10+ years (if you count things like RHEL), so in my opinion this fix here is a nice thing to have. Bailing out whenever we cannot parse |
|
Sure. Out of curiosity, how does Git itself choose the filenames? Does it use the |
|
Git uses the Inside of that function: So if they're not able to strip the prefixes, they simply bail out and fail. Same for other stuff if they're not able to parse the names. They only abort parsing the patch later on if neither So this PR here goes into the same direction, I'd argue. |
99506a4 to
d5a192c
Compare
When parsing the "---" and "+++" line, we stop after the first whitespace inside of the filename. But as files containing whitespaces do not need to be quoted, we should instead use the complete line here. This fixes parsing patches with unquoted paths with whitespaces.
The git patch format allows for having unquoted paths with whitespaces
inside. This format becomes ambiguous to parse, e.g. in the following
example:
diff --git a/file b/with spaces.txt b/file b/with spaces.txt
While we cannot parse this in a correct way, we can instead use the
"---" and "+++" lines to retrieve the file names, as the path is not
followed by anything here but spans the complete remaining line. Because
of this, we can simply bail outwhen parsing the "diff --git" header here
without an actual error and then proceed to just take the paths from the
other headers.
d5a192c to
80226b5
Compare
|
Thanks for fixing this! 😀 |
This is split out from #4272 due to some problems with ambiguous parsing of patches with whitespace and custom prefixes. This should not be merged as-is but is only intended as a reminder.
The patch format allows unquoted path names even when the paths contain
whitespaces in between. To disambiguate the path names in thee header,
one can simply use the slash in "a/" respectively "b/", as no file is
allowed to contain a forward slash.
While this format is distinctly parseable, we fail to do so correctly.
To determine the path length, we simply search for the first whitespace
and assume it is the delimiter between "a/foo.file b/foo.file", which
works for the simple cases. Instead, we should be searching for the
first occurrence of either ' b/' (note the leading space) or '\n' to
either delimit the first file path or the second file path. This does
not suffice yet, though, as the format also allows for quoted file
paths. Insofar, we also have to check for ' "b/' as delimiter for the
current file path.
Fix the issue by improving the logic determining file path length as
outlined in the above paragraph. Include a test to catch future
breakage.