Skip to content

Garbage collection: Free mostly everything automatically#1570

Merged
implausible merged 13 commits into
nodegit:masterfrom
implausible:garbage-collection/free-almost-everything
Oct 25, 2018
Merged

Garbage collection: Free mostly everything automatically#1570
implausible merged 13 commits into
nodegit:masterfrom
implausible:garbage-collection/free-almost-everything

Conversation

@implausible
Copy link
Copy Markdown
Member

@implausible implausible commented Oct 17, 2018

Garbage Collection in LibGit2

So this is as much as I could glean.

Memory models

Reference counted

Most objects that exist in the repository ODB or that extend from object are cached and reference counted. In order to ensure we are cleaning up memory for reference counted libgit2 objects, we should always call the provided libgit2 free method on the pointers when v8 garbage collects them. A lot of these reference counted objects have an underlying memory dependency with the repository, so the repository pointer will need to be kept around until the last call to the last object related to the repository has been freed. This means that most reference counted objects are owned by the repository. This class of objects will always call the free function, even if they have an owner.

Borrowed pointers

Borrowed pointers are like fields on a struct. They should never be freed. Instead, we should build an owner dependency on the v8 object that contains the struct that this borrowed pointer came from. This ownership dependency will keep the memory alive for as long as the borrowed pointer is alive.

Borrowed, duplicable pointers

Certain borrowed pointers (git_oid, git_signature...) are easily duplicable and may have provided duplication functions from libgit2. They may also be easily copied using memcpy. If a pointer is borrowed, but duplicable, no ownership relationship should be established. Instead, we should duplicate the memory and always free when v8 garbage collects us.

Non reference counted, non duplicable, mega owners

This mostly applies to the repository. The repository has a huge network of underlying relationships between the moment you git_repository_open. Almost every reference counted object has an underlying relationship with the repository via an odb cache or just plain having a field pointing back to the repository. Most reference counted objects will form an owned relationship with the repository. Unfortunately, there are several ways to get back to the same repository pointer that you got when you performed git_repository_open. That means that we need to keep track of how many times we've seen the repository. We now have an external reference counting mechanism to keep track of pointers like the repository. As new v8 objects get created that wrap the repository pointer, we reference count. When every v8 object that wraps the repository has been garbage collected, the last one to be collected will also perform the free method on the repository.
I've currently flagged these as isSingleton and refer to them as such, but that is a silly name and I am open to suggestions (if I don't find a better name myself 😄).

Additional behavior

OwnerFn

I've added a new field to the ownerFn in the descriptor.json. An example of the ownerFn is git_commit_owner. You pass it a commit pointer and it will give you the repository pointer that it is bound to. We now utilize these functions to generate a persistent handle to the repository. When a new commit is created, we bind it to the repository via git_commit_owner. As mentioned above, we will create a new v8 object wrapper around the repository handle and reference count it. This will at least guarantee that even though it's a different v8 object wrapper that we persist, it's still persisting the important underlying libgit2 pointer.

ownedBy

You can now specify ownedBy fields in arguments to instruct the descriptor that a particular argument from JS is the owner of a return result. This extends ownedByThis behavior by being more open to other ownership configurations than a this relationship.

Need to do still

  • Walk through documentation and ensure ownedBy is set on arguments at the appropriate place
  • Callback memory management is still set to never free callback parameters. We will need to find out how to fish ownership details via the async payload. I am not sure if I want to complete this in this PR or push it into the future, as I imagine callbacks are a much smaller memory foot print than being able to free the repository.
  • Write additional segfault tests like the ones seen in revwalk for the config, which is borderline similar to the repository, but not quite as complex.
  • Write additional segfault tests in general utilizing the GC.

Future work

  • Maybe callback memory managent, depends on if we can get something simple done in this PR.
  • Complex owner scenarios like git_filter. It would be nice if instead of having to write a full template, we would be capable of patching the HandleOKCallback with a custom owner resolver.
  • It would be nice if we also had an overwrite block specifically targeting function calls in AsyncWorker::Execute. For an example of what I am referring to, check out generate/templates/manual/clone/clone.cc.

- Turns on selfFreeing for basically everything, including the repository
- Ignores all git_*_free calls
- Adds ownerFn info used to make associative ownership links between objects that have an exposed link to the repository
- Added isSingleton flag for repository, this adds reference counting to the repo address externally so that no matter how many v8 objects point to it, only the last GC'd v8 object will free the repository.
- Added an "ownedBy" field for arguments to enable ownership linking to arbitrary parameters, not just "this"
- Some additional clean up of things that don't exist or shouldn't exist
- Implements isSingleton behavior with reference counting
- Implements ownerFn, which ties an arbitrary pointer to a singleton type pointer
- always calls free, even if object is "owned", the ownership link is about keeping internal pointers alive, but it seems that most calls to free are reference counted in libgit2, but the repository has a different relationship than all other types of ref counted frees.
I know we will need this in the future for merge drivers and the like. But for now, this will at least show how it could be done.
@implausible
Copy link
Copy Markdown
Member Author

This tests really well on linux locally, but I kinda want to see if it's stable across platforms.

@implausible implausible requested a review from srajko October 17, 2018 01:29
@implausible
Copy link
Copy Markdown
Member Author

@srajko If you're out there, I will have a more comprehensive write up to how I've extended your work, tomorrow.

It looks like this is pretty important to preserve, as it broke tests on windows immediately when removing it.

Because we are no longer exposing free methods in javascript land, we will need to manually perform this method at the C++ layer.
@implausible implausible force-pushed the garbage-collection/free-almost-everything branch from df27e1f to 335cc01 Compare October 17, 2018 16:18
@implausible
Copy link
Copy Markdown
Member Author

@maxkorp @tbranyen @srajko I've updated this with a big ol' write up describing what I'm doing.

@ethomson @carlosmn If either of you could fact check me on my GC write up, that would be beneficial.

@implausible
Copy link
Copy Markdown
Member Author

Also, It looks like osx is failing on one test, but because it can't find unordered_map in node 6. @stevex86 suggests there is a flag I can utilize to imagine that into existence.

@implausible
Copy link
Copy Markdown
Member Author

I don't have OSX available, so @stevex86 is gonna help me fix the unordered_map issue.

@implausible
Copy link
Copy Markdown
Member Author

Looks like we will need to revisit git_buf as it really does not behave the same as other git_*_free calls. See https://github.com/libgit2/libgit2/blob/master/include/git2/buffer.h#L22

Changes minimum support version of MacOS to 10.9.
@emmax86
Copy link
Copy Markdown
Collaborator

emmax86 commented Oct 17, 2018

I got the Node 6 target to build on MacOS. This does change minimum support version of MacOS to from 10.7 (circa 2011) to 10.9 (circa 2013), but that shouldn't be a problem.

@emmax86 emmax86 force-pushed the garbage-collection/free-almost-everything branch from 1a2012a to 337add8 Compare October 19, 2018 01:20
implausible and others added 2 commits October 19, 2018 09:28
co-authored-by: Steve King <[email protected]>
co-authored-by: Steve King <[email protected]>

We walked the entire file and compared documentation + struct implementation to try and glean ownership relationships between out parameters and arguments.

Since we had to walk this whole file, and it took about 2 1/2 days, we also spent a little bit of time touching up groupings, definitions, and ignore settings.
@implausible implausible force-pushed the garbage-collection/free-almost-everything branch from 337add8 to 1e1b639 Compare October 19, 2018 16:34
@tbranyen
Copy link
Copy Markdown
Member

👏 I'll dig in more today. One question now though, is if we should opt folks in, or if certain tasks are better to use free manually? If there is no case where calling free is better that automatic gc, we should probably think of removing those methods from being exported.

@implausible
Copy link
Copy Markdown
Member Author

I've actually removed the capacity for free functions to be called. It's all automatic if this PR lands.

@implausible implausible changed the title [wip] Garbage collection: Free mostly everything automatically Garbage collection: Free mostly everything automatically Oct 22, 2018
@implausible
Copy link
Copy Markdown
Member Author

So this is feeling pretty good when I'm testing it in GK. I wonder if we should consider merging this and doing a 0.24.0-alpha.1?

@maxkorp
Copy link
Copy Markdown
Collaborator

maxkorp commented Oct 22, 2018

So, I haven't experimented with it too thoroughly, but I've looked through it a bunch and it looks good.

I'd say we ship 0.23.0 first, and then land this in an 0.24 alpha.

@maxkorp
Copy link
Copy Markdown
Collaborator

maxkorp commented Oct 24, 2018

Yeah, I'm good with landing this once we cut 0.23

@implausible implausible merged commit 2274679 into nodegit:master Oct 25, 2018
@implausible implausible deleted the garbage-collection/free-almost-everything branch October 25, 2018 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants