Skip to content

Fix git_tree_entry double free#1332

Merged
implausible merged 4 commits into
nodegit:masterfrom
kurtb:master
Aug 30, 2017
Merged

Fix git_tree_entry double free#1332
implausible merged 4 commits into
nodegit:masterfrom
kurtb:master

Conversation

@kurtb
Copy link
Copy Markdown
Contributor

@kurtb kurtb commented Jul 26, 2017

git_tree_entry is marked as self freeing. But libgit2 only requires free'ing these in cases where the entry is retrieved from git_tree_entry_dup or git_tree_entry_bypath. Neither of which is exposed by nodegit.

This can lead to memory corruption and/or double free. This PR updates the default to make this not self freeing.

It also updates the treebuilder side to mark itself as the owner. The docs indicate https://libgit2.github.com/libgit2/#HEAD/group/treebuilder/git_treebuilder_insert that the returned tree_entry in a git_treebuilder_insert may not be valid past the next builder call. I couldn't find a way with the current generator to either fill this field in with NULL to not return the entry at all. Or duplicate it prior to returning.

kurtb added 2 commits July 26, 2017 12:28
The libgit2 use cases all are instances where the git_tree_entry is owned by the object that returned it. As such this marks the tree_entry as not self freeing. And also updates functions returning one of these to set ownedByThis. This will keep the owning object alive should someone have a reference to a tree_entry.

The git_treebuilder_insert method needed to be marked as not async in order to have access to the owning object. This seems like a case where the generator could be improved should this method need to be async.
@tommoor
Copy link
Copy Markdown

tommoor commented Jul 31, 2017

👏 would love to see this land!

@implausible
Copy link
Copy Markdown
Member

Hmm... when we set GitTreeEntry to selfFreeing, we also provide a duplicate function, which means that if the GitTreeEntry is owned, we duplicate the structure when we initialize a new wrapper for the git_tree_entry.... If I recall correctly, there is a system in place for declaring ownership over a GitTreeEntry, and if there are any double frees, we should investigate instead in the traits system and/or expand our knowledge of ownership via the generator, not just remove selfFreeing from GitTreeEntry. These leaks add up.

@implausible
Copy link
Copy Markdown
Member

I guess even further, if we can't find settings that provide the right ownership context to the tree entry on tree builder, we should expand the generator.

@kurtb
Copy link
Copy Markdown
Contributor Author

kurtb commented Aug 4, 2017

The traits system actually seems to be accomplishing that here - just not correctly configured for the tree_builder + tree_entry. For instance the tree instance's git_tree_entry_byid sets the owner to be the tree. The PR follows the same pattern by setting the tree_builder as the owner.

At that point a reference to the JS tree entry will keep the underlying git_tree_entry alive due to the JS reference to its owner. But dropping a ref to the JS tree entry won't cause the git_tree_entry to be free'd.

But in both the tree and tree_builder cases the tree_entry should not be self freeing per libgit2 docs.

Having the duplicate function get called when ownedByThis is set could be an option as well. But would result in extra allocations. Plus you'd still need to know whether what you're duplicating should be freed or is relying on someone else to free. So I'd still expect setting self freeing to false.

So removing the self freeing flag plus setting the owner seems correct in the context of the generator?

My only suggestion for the generator would be to support the ownedByThis flag in an async context. But I'm not sure what nodegit's architecture is for when to be async vs. not.

@implausible
Copy link
Copy Markdown
Member

So I'm hesitant to merge something like this, because it adequately fixes the double free, but reintroduces leakage, which is a big problem in nodegit. I think I'd rather focus on addressing the deficits in the generator.

@kurtb
Copy link
Copy Markdown
Contributor Author

kurtb commented Aug 4, 2017

So from the libgit2 docs the only time you need to free a git_tree_entry is after a call to git_tree_entry_dup or git_tree_entry_bypath.

When I made the initial PR I didn't think nodegit made use of either of these (I was probably only looking at the desciptor.js or doing my search over a partially built repo) so I thought a blanket disabling of self freeing was fine.

With this PR both of the above calls will lead to leaks. But the other cases I believe will not and also fix the double free. In those cases the owning libgit2 object owns freeing the git_tree_entry. And the ownedByThis reference should keep the owner alive as long as someone has a JS reference to a tree_entry.

The double free, in my use case of nodegit, makes it unusable since I can't use a Treebuilder without corrupting the memory of my app. But getting in a fix that avoids the double free while at the same time not leaking seems way better.

So how about something like this. Since the majority of cases treat a git_tree_entry as a non self-freeing object we still keep this turned off. But then for the git_tree_entry_dup and git_tree_entry_bypath we update the generator to mark these as overrides of the self freeing flag? I think the generator may even support that today?

An alternative would be to mark the cases where the object was not self freeing. And call its dup function in all those instances. I believe this fix would require similar changes as the other one. But would then have any JS tree entry be self freeing. Since the generator already supports marking an owner object, and this avoids an extra allocation, I'd prefer the former over this one.

Fix up ownership on all tree_entry related calls
@kurtb
Copy link
Copy Markdown
Contributor Author

kurtb commented Aug 16, 2017

Did the above except flipped back to self freeing being true by default.

@implausible
Copy link
Copy Markdown
Member

Would you mind adding a test to cover the case where you were experiencing the double free? That will prevent regression in the future. 👍

@kurtb
Copy link
Copy Markdown
Contributor Author

kurtb commented Aug 16, 2017

Added in a test to validate the memory problems from the Treebuilder insert. I have a stress test that can trigger a segfault from the double free. But this seemed overkill relative to the other tests. And not particularly deterministic. Instead leveraged the leak test util to validate the reference count matches the expected values.

@kurtb kurtb closed this Aug 16, 2017
@kurtb kurtb reopened this Aug 16, 2017
@kurtb
Copy link
Copy Markdown
Contributor Author

kurtb commented Aug 16, 2017

The generator will duplicate the value if an owner is specified. So even though the native object isn't self-freeing, it will become it once duplicated. So can use this in the unit test.

@implausible
Copy link
Copy Markdown
Member

Yes, that sounds correct.

@kurtb
Copy link
Copy Markdown
Contributor Author

kurtb commented Aug 24, 2017

Anything else needed prior to a merge? Looking forward to seeing this in so I can go back to the official npm package.

@implausible implausible merged commit 952bf8d into nodegit:master Aug 30, 2017
@implausible
Copy link
Copy Markdown
Member

We will have a new release soon (<2 weeks), we've got another fix I'd like to get in with this that addresses another crash in blobs. @kurtb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants