Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upSorting in TreeModifier.set_done() seems to be inconsistent with git's own sorting -> breaks repository #369
Comments
Thanks for the detailed and informative issue ! This is indeed a critical one, whose fix is on the way. To the interested audience, this is how git compares tree entries: int name_compare(const char *name1, size_t len1, const char *name2, size_t len2)
{
size_t min_len = (len1 < len2) ? len1 : len2;
int cmp = memcmp(name1, name2, min_len);
if (cmp)
return cmp;
if (len1 < len2)
return -1;
if (len1 > len2)
return 1;
return 0;
} Now git-python does it similarly. It's worth noting that the most difficult was to make it work in Py3 as well as in Py2, which contributes to most of the added code. |
I am hoping to hear from you if it does indeed fix the issue on your end :). |
Whoa, hadn't thought this would be this complicated. Big thanks for the quick response! I will test if the bug is fixed for our repositories, but I'm pretty certain we will have no more issues. |
Sadly I still get errors. I took one tree which produces errors and used "git show" to see what is stored in object store. The attached files correspond to the four different ways I used to create the tree:
git seems to do some additional magic? Or am I missing something? |
Thanks a lot ! With the new, improved dataset, I am sure there will soon be an actual fix. |
The problem is that a per-tree modification API cannot work properly, as the sorting is based on full paths of all entries within the repository. This feat can only be achieved by the index, which to my knowledge already does it correctly. The only fix is to remove the misleading API entirely, which will happen in the next commit. Related to #369
Please have a look at the two commits related to the issue to see if this can indeed solve the problem. Long story short: The API used here shouldn't have existed in the first place, as it's not possible to sort individual trees correctly. Instead, please resort to modifying index entries and use Of course, if this still is not working or creates invalid repositories, please let me know right away so I can have another look. Thanks for your effort so far, I found the input very valuable. |
git enforces tree objects to be sorted by name. GitPython enables users to sort their tree's by calling TreeModifier.set_done(), see:
https://github.com/gitpython-developers/GitPython/blob/master/git/objects/tree.py#L45
This sorting somehow seems to be inconsistent with git's own sorting mechanism, which means adding a tree might break the git repository. This means "git fsck" will throw errors and some remotes (gitlab for example) will no accept the commit. Anyways this bug leaves the git object store in an broken state which cannot be fixed easily, especially if there is much history after the broken commit/tree. I had to delete multiple branches from projects I manage because of their broken state so far.
About the problem itself:
GitPython uses core Python functions to sort the file list. This looks correct, but has one major difference to what git itself does. If you have two files beginning with the same chars (for example "file" and "file.second") their ordering will differ:
What git does:
What GitPython does:
We found this problem using fabdeploit for deployments using git, which allows us to add/removed files to/from the git repository. This way tree's will be modified which means sorting is essential.
Please fix this. ;-)