Garbage collection: Free mostly everything automatically#1570
Conversation
- Turns on selfFreeing for basically everything, including the repository - Ignores all git_*_free calls - Adds ownerFn info used to make associative ownership links between objects that have an exposed link to the repository - Added isSingleton flag for repository, this adds reference counting to the repo address externally so that no matter how many v8 objects point to it, only the last GC'd v8 object will free the repository. - Added an "ownedBy" field for arguments to enable ownership linking to arbitrary parameters, not just "this" - Some additional clean up of things that don't exist or shouldn't exist
- Implements isSingleton behavior with reference counting - Implements ownerFn, which ties an arbitrary pointer to a singleton type pointer - always calls free, even if object is "owned", the ownership link is about keeping internal pointers alive, but it seems that most calls to free are reference counted in libgit2, but the repository has a different relationship than all other types of ref counted frees.
I know we will need this in the future for merge drivers and the like. But for now, this will at least show how it could be done.
|
This tests really well on linux locally, but I kinda want to see if it's stable across platforms. |
|
@srajko If you're out there, I will have a more comprehensive write up to how I've extended your work, tomorrow. |
It looks like this is pretty important to preserve, as it broke tests on windows immediately when removing it. Because we are no longer exposing free methods in javascript land, we will need to manually perform this method at the C++ layer.
df27e1f to
335cc01
Compare
|
Also, It looks like osx is failing on one test, but because it can't find unordered_map in node 6. @stevex86 suggests there is a flag I can utilize to imagine that into existence. |
|
I don't have OSX available, so @stevex86 is gonna help me fix the unordered_map issue. |
|
Looks like we will need to revisit git_buf as it really does not behave the same as other git_*_free calls. See https://github.com/libgit2/libgit2/blob/master/include/git2/buffer.h#L22 |
Changes minimum support version of MacOS to 10.9.
|
I got the Node 6 target to build on MacOS. This does change minimum support version of MacOS to from 10.7 (circa 2011) to 10.9 (circa 2013), but that shouldn't be a problem. |
1a2012a to
337add8
Compare
co-authored-by: Steve King <[email protected]>
co-authored-by: Steve King <[email protected]> We walked the entire file and compared documentation + struct implementation to try and glean ownership relationships between out parameters and arguments. Since we had to walk this whole file, and it took about 2 1/2 days, we also spent a little bit of time touching up groupings, definitions, and ignore settings.
337add8 to
1e1b639
Compare
|
👏 I'll dig in more today. One question now though, is if we should opt folks in, or if certain tasks are better to use free manually? If there is no case where calling free is better that automatic gc, we should probably think of removing those methods from being exported. |
|
I've actually removed the capacity for free functions to be called. It's all automatic if this PR lands. |
|
So this is feeling pretty good when I'm testing it in GK. I wonder if we should consider merging this and doing a 0.24.0-alpha.1? |
|
So, I haven't experimented with it too thoroughly, but I've looked through it a bunch and it looks good. I'd say we ship 0.23.0 first, and then land this in an 0.24 alpha. |
|
Yeah, I'm good with landing this once we cut 0.23 |
Garbage Collection in LibGit2
So this is as much as I could glean.
Memory models
Reference counted
Most objects that exist in the repository ODB or that extend from object are cached and reference counted. In order to ensure we are cleaning up memory for reference counted libgit2 objects, we should always call the provided libgit2 free method on the pointers when v8 garbage collects them. A lot of these reference counted objects have an underlying memory dependency with the repository, so the repository pointer will need to be kept around until the last call to the last object related to the repository has been freed. This means that most reference counted objects are owned by the repository. This class of objects will always call the free function, even if they have an owner.
Borrowed pointers
Borrowed pointers are like fields on a struct. They should never be freed. Instead, we should build an owner dependency on the v8 object that contains the struct that this borrowed pointer came from. This ownership dependency will keep the memory alive for as long as the borrowed pointer is alive.
Borrowed, duplicable pointers
Certain borrowed pointers (git_oid, git_signature...) are easily duplicable and may have provided duplication functions from libgit2. They may also be easily copied using memcpy. If a pointer is borrowed, but duplicable, no ownership relationship should be established. Instead, we should duplicate the memory and always free when v8 garbage collects us.
Non reference counted, non duplicable, mega owners
This mostly applies to the repository. The repository has a huge network of underlying relationships between the moment you git_repository_open. Almost every reference counted object has an underlying relationship with the repository via an odb cache or just plain having a field pointing back to the repository. Most reference counted objects will form an owned relationship with the repository. Unfortunately, there are several ways to get back to the same repository pointer that you got when you performed git_repository_open. That means that we need to keep track of how many times we've seen the repository. We now have an external reference counting mechanism to keep track of pointers like the repository. As new v8 objects get created that wrap the repository pointer, we reference count. When every v8 object that wraps the repository has been garbage collected, the last one to be collected will also perform the free method on the repository.
I've currently flagged these as isSingleton and refer to them as such, but that is a silly name and I am open to suggestions (if I don't find a better name myself 😄).
Additional behavior
OwnerFn
I've added a new field to the ownerFn in the descriptor.json. An example of the ownerFn is
git_commit_owner. You pass it a commit pointer and it will give you the repository pointer that it is bound to. We now utilize these functions to generate a persistent handle to the repository. When a new commit is created, we bind it to the repository via git_commit_owner. As mentioned above, we will create a new v8 object wrapper around the repository handle and reference count it. This will at least guarantee that even though it's a different v8 object wrapper that we persist, it's still persisting the important underlying libgit2 pointer.ownedBy
You can now specify ownedBy fields in arguments to instruct the descriptor that a particular argument from JS is the owner of a return result. This extends ownedByThis behavior by being more open to other ownership configurations than a this relationship.
Need to do still
Future work