Skip to content

utils: make sure memchr() is not invoked with NULL#4733

Closed
neithernut wants to merge 1 commit into
libgit2:masterfrom
neithernut:fix-git__linenlen-ub
Closed

utils: make sure memchr() is not invoked with NULL#4733
neithernut wants to merge 1 commit into
libgit2:masterfrom
neithernut:fix-git__linenlen-ub

Conversation

@neithernut
Copy link
Copy Markdown
Contributor

Passing a null pointer as first argument to memchr() results in
undefined behavior.

In practice, this should not be a problem since, within libgit2,
git__linenlen() will be either called with a non-null buffer or with
both buffer and buffer_len being null. Any sane implementation of
memchr() should return NULL in this case without causing any
malicious or unintended effects.

However, it is still an UB. It will be reported by both dynamic and
static UB checkers. Hence, we add this additional check.

Resolves #4726.

@neithernut
Copy link
Copy Markdown
Contributor Author

@stewid: let me know if you want to be named in a Reported-by line.

@ethomson
Copy link
Copy Markdown
Member

I’m interested in your analysis of this problem. When are we calling this function with a null buffer? Why should we change this and not the caller?

@neithernut
Copy link
Copy Markdown
Contributor Author

As determined in the referenced issue, there are two callers, both in parse.c, both carrying out operations on a git_parse_ctx. In both cases, git__linenlen() is used for determining and setting the length (field line_len) of the next line (pointed to by field line of a git_parse_ctx). line may be NULL, e.g. if:

  • the parse context is be initialized with NULL for the buffer,
  • 0 is passed for the buffer-length to the initialization function or
  • the context is reset using git_parse_ctx_clear().

Based on this, I considered three options:

  1. Enforcing field line inside git_parse_ctx to be non-null: having NULL mean "empty buffer" is quite common, but I suspect that the context is actually always initialized with some non-null buffer which would, in turn, result in a non-null line. However, I did not check. Also, changing the semantics so that line is set to a pointer to an empty string would mean introducing multiple checks, extra code for git_parse_ctx_clear() and, most notably, would introduce an invariant rather uncommon (in C) for this type of object.
  2. Adding additional checks to the call-site: every call-site is affected, so every (existing) call-site would need some additional check. In the case of git_parse_advance_line(), this could be an early return, which would provide the additional benefit of the line_num field not being incremented. However, I assume that this does not really matter, since the line-number may be considered "undefined" anyway for a null-buffer (it's not nice, but I doubt anybody cares).
  3. Adding an additional check to git__linenlen() (as proposed in this PR): this required adding only a single check, rather than a check for every single call-site. This makes the function "safe" for the situations of interest, both for the existing and possible future uses of that function.

If desired, I can either change this PR or file an alternative PR implementing the 2nd of those approaches rather then the 3rd.

Comment thread src/util.c
char *nl = memchr(buffer, '\n', buffer_len);
char *nl;

if (!buffer)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is sensible. I'd love to have an assert(buffer || !buffer_len); here, though, such that we can catch programming errors in case anybody calls git__linenlen(NULL, n) with n > 0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an assertion.

Passing a null pointer as first argument to `memchr()` results in
undefined behavior.

In practice, this should not be a problem since, within libgit2,
`git__linenlen()` will be either called with a non-null `buffer` or with
both `buffer` and `buffer_len` being null. Any sane implementation of
`memchr()` should return `NULL` in this case without causing any
malicious or unintended effects.

However, it is still an UB. It will be reported by both dynamic and
static UB checkers. Hence, we add this additional check. And while we're
at it, we also introduce an assertion to catch calls with a null
`buffer` and a non-null `buffer_len`.
@neithernut neithernut force-pushed the fix-git__linenlen-ub branch from 28f2749 to 6809f7d Compare July 26, 2018 17:31
@neithernut
Copy link
Copy Markdown
Contributor Author

neithernut commented Jul 27, 2018

One of the AppVeyor jobs failed due to network problems. It should be sufficient to simply restart the job.

@pks-t
Copy link
Copy Markdown
Member

pks-t commented Jul 27, 2018

I've restarted the AppVeyor build

@ethomson
Copy link
Copy Markdown
Member

Thanks for the analysis, @neithernut. The reason I asked is because few (none?) of our other string helper functions do a null check like this. (I didn't do an exhaustive analysis, I just quickly skimmed util.c.) I'd like to take a moment to do a little more digging to ensure that we don't change a function to be very dissimilar than the rest of the internal library when it comes to its preconditions.

@pks-t
Copy link
Copy Markdown
Member

pks-t commented Aug 2, 2018

While most (all?) other string functions don't do NULL checks, I find the semantics in this case to be rather self-evident: if you've got no string, the line length is clearly "0". On the other hand strlen(3P) exhibits undefined behaviour in case its parameter is NULL, which would be a case against including the NULL check in git__linelen. Also: even though we'd be safe wrt git_linelen, it might still be that we pass the NULL string to other functions afterwards, as @ethomson said.

So this all speaks against my previous thinking. So maybe we should make sure that a parser context is never being initialized with a NULL pointer?

@ethomson
Copy link
Copy Markdown
Member

ethomson commented Aug 2, 2018

So this all speaks against my previous thinking. So maybe we should make sure that a parser context is never being initialized with a NULL pointer?

That makes sense to me.

@neithernut
Copy link
Copy Markdown
Contributor Author

In this case, I'll prepare patches implementing approach 2 (changing git_parse_*). Probably over the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants