Ensure packfiles with different contents have different names#4088
Conversation
1126375 to
626b44f
Compare
|
This got a merge conflict with #4030, which also added error checking to the I notice that the error handling added in #4030 just does |
626b44f to
afa2b04
Compare
| git_hash_final(&trailer_hash, &idx->trailer); | ||
| if (git_oid_cmp(&file_hash, &trailer_hash)) { | ||
| git_hash_final(&idx->hash, &idx->trailer); | ||
| if (git_oid_cmp(&file_hash, &idx->hash)) { |
There was a problem hiding this comment.
The change would be more obviously correct if these files were kept unchanged and then replacing the git_hash_final(&idx->hash, &ctx) with a git_oid_cpy(&idx->hash, &trailer_hash).
|
Thanks for this change and sorry for not taking a look at it earlier. The change itself looks correct. I think it'd be a bit easier to review though if making the proposed change to keep the |
Upstream git.git has changed the way how packfiles are named. Previously, they were using a hash of the contained object's OIDs, which has then been changed to use the hash of the complete packfile instead. See 1190a1acf (pack-objects: name pack files after trailer hash, 2013-12-05) in the git.git repository for more information on this change. This commit changes our logic to match the behavior of core git.
|
@pks-t Thanks! I'll make the changes you suggested and fix the commit message. ... And while trying to make the code more obviously correct, I think it turned out the old version was incorrect after all! The |
afa2b04 to
c0e5415
Compare
|
Sorry about the delay - I didn't notice the update. I agree with the direction here, but I think that we should match the changes in git.git. In particular, we should be using the pack index's trailer - that would be the second-to-last 20 bytes of the index. Here's an index created by git.git: Note that the packfile name is But with this change, libgit2 is not matching the filename to the trailer. Packing the same objects: Here the trailer is |
|
@ethomson I agree that the filename should be Oh, that's the hash from the Are you sure you ran your test with these changes? If you're packing |
|
No. I'm not at all sure. Looking at your diff - yes, your explanation matches the code, and is what I was expecting. So I may have been looking at the wrong thing. Let me take another look. 👀 |
|
Yep, you're totally right @chescock - thanks. I'm not sure what I was looking at but your changes produce the |
Use the hash of the complete packfile as the filename instead of the hash of just the OIDs. This matches the behavior of core git, which changed in 2013 in commit 1190a1a. The message for that commit includes the reasoning behind the change.
I ran into a problem with the current naming in an application that occasionally performs two fetches of the same remote at the same time. The objects in the resulting packfiles are the same, so they get the same filename, but the contents are sometimes different, likely because pack building is multi-threaded. The second fetch to complete will overwrite the
.idxfile from the first fetch, but fail to rename the.packfile, so the.idxand.packfiles don't match. Reading a branch that was just fetched will then fail with "Target OID for the reference doesn't exist on the repository".The second commit adds error handling to the rename call so that this situation would also report an error to the client.