Skip to content

Ensure packfiles with different contents have different names#4088

Merged
ethomson merged 1 commit into
libgit2:masterfrom
chescock:packfile-name-using-complete-hash
Jun 11, 2017
Merged

Ensure packfiles with different contents have different names#4088
ethomson merged 1 commit into
libgit2:masterfrom
chescock:packfile-name-using-complete-hash

Conversation

@chescock
Copy link
Copy Markdown
Contributor

Use the hash of the complete packfile as the filename instead of the hash of just the OIDs. This matches the behavior of core git, which changed in 2013 in commit 1190a1a. The message for that commit includes the reasoning behind the change.

I ran into a problem with the current naming in an application that occasionally performs two fetches of the same remote at the same time. The objects in the resulting packfiles are the same, so they get the same filename, but the contents are sometimes different, likely because pack building is multi-threaded. The second fetch to complete will overwrite the .idx file from the first fetch, but fail to rename the .pack file, so the .idx and .pack files don't match. Reading a branch that was just fetched will then fail with "Target OID for the reference doesn't exist on the repository".

The second commit adds error handling to the rename call so that this situation would also report an error to the client.

@chescock chescock force-pushed the packfile-name-using-complete-hash branch from 1126375 to 626b44f Compare January 23, 2017 21:39
@chescock
Copy link
Copy Markdown
Contributor Author

This got a merge conflict with #4030, which also added error checking to the p_rename call. I'll remove the second commit from this pull request.

I notice that the error handling added in #4030 just does goto on_error;, but other places that call p_rename seem to always call giterr_set first. Is that a problem?

@chescock chescock force-pushed the packfile-name-using-complete-hash branch from 626b44f to afa2b04 Compare May 18, 2017 14:15
Comment thread src/indexer.c Outdated
git_hash_final(&trailer_hash, &idx->trailer);
if (git_oid_cmp(&file_hash, &trailer_hash)) {
git_hash_final(&idx->hash, &idx->trailer);
if (git_oid_cmp(&file_hash, &idx->hash)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change would be more obviously correct if these files were kept unchanged and then replacing the git_hash_final(&idx->hash, &ctx) with a git_oid_cpy(&idx->hash, &trailer_hash).

@pks-t
Copy link
Copy Markdown
Member

pks-t commented May 19, 2017

Thanks for this change and sorry for not taking a look at it earlier.

The change itself looks correct. I think it'd be a bit easier to review though if making the proposed change to keep the git_hash_final in-tact and do an OID copy into the index later on (see my in-line comment). Please also try to honor our usual commit message format, e.g. something like the following might be good:

indexer: name pack files after trailer hash

Upstream git.git has changed the way how packfiles are named.
Previously, they were using a hash of the contained object's OIDs, which
has then been changed to use the hash of the complete packfile instead.
See 1190a1acf (pack-objects: name pack files after trailer hash,
2013-12-05) in the git.git repository for more information on this
change.

This commit changes our logic to match the behavior of core git.

Upstream git.git has changed the way how packfiles are named.
Previously, they were using a hash of the contained object's OIDs, which
has then been changed to use the hash of the complete packfile instead.
See 1190a1acf (pack-objects: name pack files after trailer hash,
2013-12-05) in the git.git repository for more information on this
change.

This commit changes our logic to match the behavior of core git.
@chescock
Copy link
Copy Markdown
Contributor Author

@pks-t Thanks! I'll make the changes you suggested and fix the commit message.

... And while trying to make the code more obviously correct, I think it turned out the old version was incorrect after all! The if (stats->local_objects > 0) branch updates trailer_hash to fix thin packs, and my change was ignoring those updates. I made sure to put the git_oid_cpy call after that code.

@chescock chescock force-pushed the packfile-name-using-complete-hash branch from afa2b04 to c0e5415 Compare May 19, 2017 14:56
@ethomson
Copy link
Copy Markdown
Member

ethomson commented Jun 8, 2017

Sorry about the delay - I didn't notice the update. I agree with the direction here, but I think that we should match the changes in git.git. In particular, we should be using the pack index's trailer - that would be the second-to-last 20 bytes of the index. Here's an index created by git.git:

> hexdump -C  pack-3b1c39521270e157f7b8a3653520702046c180ef.idx
...packfile index data here...
00000630  00 00 06 ce 00 00 05 c2  3b 1c 39 52 12 70 e1 57  |........;.9R.p.W|
00000640  f7 b8 a3 65 35 20 70 20  46 c1 80 ef f9 1d e6 ec  |...e5 p F.......|
00000650  ce 17 b3 ec 4c fd 7c 8f  43 ca 4b b5 c8 21 9a 0e  |....L.|.C.K..!..|
00000660

Note that the packfile name is 3b1c39... and the trailer (beginning at length - 40 bytes) is 3b 1c 39 ...

But with this change, libgit2 is not matching the filename to the trailer. Packing the same objects:

>hexdump -C pack-80e61eb315239ef3c53033e37fee43b744d57122.idx
...packfile data...
00000630  00 00 06 ce 00 00 06 07  7f 5f a3 62 c6 64 d6 8b  |........._.b.d..|
00000640  a7 22 12 59 be 1c bd 18  74 34 b2 f0 3c 59 dc c3  |.".Y....t4..<Y..|
00000650  a2 d3 70 1f f6 da f7 84  c6 4b 96 b4 8b c3 ce d7  |..p......K......|
00000660

Here the trailer is 7f 5f a3 .... Shouldn't our filename here be pack-7f5fa3...?

@chescock
Copy link
Copy Markdown
Contributor Author

chescock commented Jun 8, 2017

@ethomson I agree that the filename should be pack-7f5fa3... in that case. I thought that's what my change was doing...

Oh, that's the hash from the tests/pack/packbuilder.c test! That test used to expect the filename to be pack-80e61eb315239ef3c53033e37fee43b744d57122.idx, and I changed it to expect pack-7f5fa362c664d68ba7221259be1cbd187434b2f0.idx. I ran the git pack-objects command from the comment and got the pack-7f5fa3... filename, so I believe that's the same name that git.git creates.

Are you sure you ran your test with these changes? If you're packing tests/resources/testrepo.git, then the output you listed matches what I would expect without them. If you are running with these changes, does the tests/pack/packbuilder.c test pass? I would have expected the test to fail if libgit2 was still using the pack-80e61e... filename.

@ethomson
Copy link
Copy Markdown
Member

No. I'm not at all sure. Looking at your diff - yes, your explanation matches the code, and is what I was expecting. So I may have been looking at the wrong thing. Let me take another look. 👀

@ethomson
Copy link
Copy Markdown
Member

Yep, you're totally right @chescock - thanks. I'm not sure what I was looking at but your changes produce the idxs that I would expect...!

@ethomson ethomson merged commit 6f960b5 into libgit2:master Jun 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants