Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not lose folders after rename and checkout #34

Open
chrisd8088 opened this issue Jul 2, 2019 · 1 comment
Open

do not lose folders after rename and checkout #34

chrisd8088 opened this issue Jul 2, 2019 · 1 comment

Comments

@chrisd8088
Copy link

@chrisd8088 chrisd8088 commented Jul 2, 2019

The following currently fails on Linux (assuming Foo is an un-hydrated folder present in the repository), because although Bar is removed by the final git checkout, Foo is not restored:

git checkout -b foo
git mv Foo Bar
git add .
git commit -m foo
git checkout master
@chrisd8088
Copy link
Author

@chrisd8088 chrisd8088 commented Jul 27, 2019

This issue exposes a significant and somewhat complex difference between the current Windows and Mac implementations. The fundamental challenge is how to track not-yet-fully-hydrated directories and files if any of their ancestor directories are renamed.

In the Windows case, the git mv would simply be flatly denied, as would a mv -- projected folders are (a) never converted to "full", regular folders, and (b) never allowed to be renamed. (Regular folders, created by the user, can be renamed.) The advice given to Windows VFSForGit users who want to rename a directory is just to copy it and then delete the old one. This restriction effectively avoids any possibility that a projected folder or file could be "lost" because one of its ancestors changes name. A relevant Windows-only functional test demonstrates this restriction.

On Mac, however, projected directories are converted to full, regular ones once all their immediate children have been enumerated. The consequence is that they can then be renamed, which could cause any not-yet-projected descendants lower in their hierarchy to be "lost" -- i.e., their new paths would not correspond to any paths the provider was able to retrieve from its upstream data source.

So on the Mac, when a rename event is detected on a directory, the kext first sends a request to the PrjFSLib user-space library to recursively enumerate all descendants of the directory. This is an unbounded operation in terms of time, obviously, but ensures that no descendants will be lost as a result of the rename.

Another factor to consider is that on both systems, it is possible -- even necessary -- for the GVFS provider (unlike MirrorProvider) to re-create new projected, not-yet-hydrated directories and files outside of the context of a parent directory's enumeration. Quoting @wilbaker:

The scenario I’m referring to is one like this:

  1. User is on branch_1
  2. User modifies x/y/z/foo.txt
    a. x/y/z/foo.txt is added to ModifiedPaths.dat
    b. x/y/z also has a file named bar.txt that the user did not touch
  3. User adds/commits the change to x/y/z/foo.txt
  4. User checks out branch_2, and in branch_2 folder x does not exist
  5. User checks out branch_1 again
    a. At the start of the checkout, folder x is not on disk (and on Windows, VFS4G is not projecting it)
    b. Git creates x/y/z/foo.txt (and all the required intermediate folders)
    c. Git does not create x/y/z/bar.txt (it was never added to ModifiedPaths.dat)

After the checkout in step 5 completes, the user should see x/y/z/bar.txt. If VFS4G did nothing special when git created x/y/z/foo.txt then x/y/z/bar.txt would not be visible to the user because x, y, and z would all be regular full folders. To ensure that x/y/z/bar.txt is visible:

On Windows: x,y,and z are all converted to partial. That way VFS4G will get a callback when x/y/z is enumerated and it can inject bar.txt into the results
On Mac: x,y, and z are marked as needing re-expansion. When the git command completes (and the post-command hook is waiting on VFS4G) VFS4G will re-expand x, y, and z and make sure all of their children are on disk (including a placeholder for x/y/z/bar.txt)

Note that we currently, on Linux, have inherited the Mac logic to mark directories as needing re-expansion.

Given these constraints, we are faced with one of two or maybe three options:

  1. Windows model: Never convert hydrated directories as full, reject all attempts to rename non-regular (i.e., full) directories, and convert parent directories to hydrated-but-not-full (i.e., "partial" in the Windows sense) when a new un-hydrated directory is created outside of an enumeration callback.

  2. Mac model: Recursively expand all descendants before allowing a rename to proceed, which may take an unknown amount of time, and continue to mark directories as needing re-expansion.

  3. Possible hybrid model (see below): Using reference counts storied in extended attributes on hydrated directories, track when they can be converted to full because all of their descendants have been fully hydrated; reject attempts to rename directories which are not full (like on Windows); and when creating new projected files and directories outside of an enumeration callback context, mark convert any full ancestor directories back to hydrated, non-full ones with a reference count again.

This final, somewhat hypothetical option might be able to combine the best of both worlds, by avoiding unbounded recursion -- which necessitates rejecting rename operations when any descendent is not yet populated; not ideal, but at least a clean model which matches the Windows implementation -- but yet allowing directories to become full ones where we can.

The key would have to be that libprojfs would need to reject rename operations on directories in a non-full state, and directories would need to be moved into a populated/hydrated state after being enumerated, rather than being moved directly to the full state as they are now. A hydrated directory would have a second xattr, user.projection.count, which would be incremented for each direct, non-full child which was created during enumeration. As files and sub-directories were fully projected, the count would be decremented, and if it reached zero, the directory would be moved to the full state, and its parent's count would be decremented (potentially recursing upwards, but never downwards).

When a new projected file or directory was created outside of an enumeration callback, we would need to first traverse the path down from the mount point, marking each full ancestor directory (if any) as non-full, with a count of zero. Then the requested file or directory would be created, and the path traversed upwards again, incrementing the count on each ancestor.

One nice property of this logic would be that, once a directory was marked non-full, it could not be renamed -- and so race conditions with other rename operations would be impossible; we couldn't "lose" our ancestors if an intermediate directory was renamed while we were creating the new projected file, for example. We could race with deletion operations, but that would be acceptable, as deletion can occur at any time. If an error occurs while creating the new projected file or directory, or some other error occurred, we would want to try recursing back upwards, making the ancestors full again and removing the zero-value count xattr. This recursion could, of course, experience it own failures, but the worst-case scenario would be that we could leave some directories marked as non-full when they could, possibly, be converted to full.

Deletion of non-full files and directories would also need to decrement their parent's count value, and possible recurse upwards if the parent became full as a result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.