Skip to content

gitignore with escapes#5097

Merged
ethomson merged 7 commits into
libgit2:masterfrom
pks-t:pks/ignore-escapes
Jun 13, 2019
Merged

gitignore with escapes#5097
ethomson merged 7 commits into
libgit2:masterfrom
pks-t:pks/ignore-escapes

Conversation

@pks-t
Copy link
Copy Markdown
Member

@pks-t pks-t commented Jun 7, 2019

Various fixes to how we handle escapes when parsing gitignore patterns.

pks-t added 2 commits June 7, 2019 10:33
The "ignore.c.bak" file has accidentally been checked in via
commit 1916490 (ignore: test that leading whitespace is
significant, 2019-05-19) and should obviously not be part of our
test suites. Delete it.
We had several occasions where tests for the gitignore had been
added to status::ignore instead of the easier-to-handle
attr::ignore test suite. This most likely resulted from the fact
that the attr::ignore test suite is not easy to discover inside
of the attr folder. Furthermore, ignore being part of the
attributes code is an implementation detail, only, and thus
shouldn't be stressed as much.

Improve this by moving both attr::ignore and status::ignore tests
into a new ignore test suite.
@ethomson
Copy link
Copy Markdown
Member

Thanks for 65ad6b5. 😢

@ethomson
Copy link
Copy Markdown
Member

As there is no platform that regards "\foo" as absolute, to the best of my knowledge, only treat "" as absolute when we have a drive or Windows-style network path prefix. Add tests to catch regressions.

So this is an interesting question. Is a path prefixed by a \ absolute on Windows? And I think the answer is yes. Windows has the notion of a "current drive" that makes the drive specifier unnecessary in paths. eg, if your current drive is C: and you want to open \autoexec.bat then this is implicitly C:\autoexec.bat.

You can see this on the shell. If you're in cmd, you should be able to run C: to set your current drive to C:, then you should be able to run type \autoexec.bat and have it type C:\autoexec.bat. (OK, you probably can't do this, I'm not sure if C:\autoexec.bat exists on a modern system. Replace with a path that you know to exist on your drive.)

What I'm not sure is what that means exactly for us. I think that CreateFile("\foo") will absolutely take the current drive into account, so I think that it means that a \ prefixed path is absolute on Windows.

I think that this is what the existing code was trying to communicate - and that it is correct, but only on Windows.

@ethomson
Copy link
Copy Markdown
Member

I'm +1 on this otherwise.

@pks-t
Copy link
Copy Markdown
Member Author

pks-t commented Jun 11, 2019

I think that this is what the existing code was trying to communicate - and that it is correct, but only on Windows.

Okay, thanks for clarifying. My first approach was to simply move that code into the Windows-specific ifdef section. But I've been mighty surprised by us treating "C:"-style drive prefixes as absolute on all systems, but network-style absolute paths are only treated as absolute on Windows. So... I guess we should change that to also treat "C:"-style things as absolute on Windows-based systems, only?

@ethomson
Copy link
Copy Markdown
Member

Okay, thanks for clarifying. My first approach was to simply move that code into the Windows-specific ifdef section. But I've been mighty surprised by us treating "C:"-style drive prefixes as absolute on all systems, but network-style absolute paths are only treated as absolute on Windows. So... I guess we should change that to also treat "C:"-style things as absolute on Windows-based systems, only?

Ugh. I think so? Importantly, we (try to) do this right in the network code, where C:/foo is a local path on Windows and an ssh path everywhere else.

But that's a legitimate relative path on (non-macOS) Unix systems, so... yeah, we should probably be treating it as absolute only on Windows systems.

@ethomson
Copy link
Copy Markdown
Member

But that's a legitimate relative path on (non-macOS) Unix systems

Also, this is a lie. It's a legitimate path on macOS as well. I thought this was actually a reserved character in the Mac filesystem. (For pre-OS X compatibility, which used : as a path delimiter.)

But I'm mistaken. Some apps still don't allow you to create files with : in them (notably, Finder) but this is supported in the filesystem.

Comment thread src/attr_file.c Outdated
Windows-based systems treat paths starting with '\' as absolute,
either referring to the current drive's root (e.g. "\foo" might
refer to "C:\foo") or to a network path (e.g. "\\host\foo"). On
the other hand, (most?) systems that are not based on Win32
accept backslashes as valid characters that may be part of the
filename, and thus we cannot treat them to identify absolute
paths.

Change the logic to only paths starting with '\' as absolute on
the Win32 platform. Add tests to avoid regressions and document
behaviour.
@pks-t pks-t force-pushed the pks/ignore-escapes branch from 953310e to 880816f Compare June 13, 2019 08:45
When parsing attributes, we need to search for the first
unescaped whitespace character to determine where the pattern is
to be cut off. The scan fails to account for the case where the
escaping '\' character is itself escaped, though, and thus we
would not recognize the cut-off point in patterns like "\\ ".

Refactor the scanning loop to remember whether the last character
was an escape character. If it was and the next character is a
'\', too, then we will reset to non-escaped mode again. Thus, we
now handle escaped whitespaces as well as escaped wildcards
correctly.
@pks-t pks-t force-pushed the pks/ignore-escapes branch from 880816f to 3b51735 Compare June 13, 2019 09:03
pks-t added 3 commits June 13, 2019 11:03
When parsing attribute patterns, we will eventually unescape the
parsed pattern. This is required because we require custom
escapes for whitespace characters, as normally they are used to
terminate the current pattern. Thing is, we don't only unescape
those whitespace characters, but in fact all escaped sequences.
So for example if the pattern was "\*", we unescape that to "*".
As this is directly passed to fnmatch(3) later, fnmatch would
treat it as a simple glob matching all files where it should
instead only match a file with name "*".

Fix the issue by unescaping spaces, only. Add a bunch of tests to
exercise escape parsing.
When determining the trailing space length, we need to honor
whether spaces are escaped or not. Currently, we do not check
whether the escape itself is escaped, though, which might
generate an off-by-one in that case as we will simply treat the
space as escaped.

Fix this by checking whether the backslashes preceding the space
are themselves escaped.
In our attributes pattern parsing code, we have a comment that
states we might have to convert '\' characters to '/' to have
proper POSIX paths. But in fact, '\' characters are valid inside
the string and act as escape mechanism for various characters,
which is why we never want to convert those to POSIX directory
separators. Furthermore, gitignore patterns are specified to only
treat '/' as directory separators.

Remove the comment to avoid future confusion.
@pks-t
Copy link
Copy Markdown
Member Author

pks-t commented Jun 13, 2019

So I've changed git_path_root now to treat paths with leading backslash as absolute on Windows-based systems, only. All the other systems will now treat those as relative paths. I've refrained from modifying existing behaviour with drive-prefixes, I'd rather do that in a separate PR.

@ethomson ethomson merged commit a5ddae6 into libgit2:master Jun 13, 2019
@ethomson
Copy link
Copy Markdown
Member

Sweet, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants