Community
Participate
Working Groups
I have raised a Gerrit issue on the GitHub plugin (https://bugs.chromium.org/p/gerrit/issues/detail?id=11763) which is actually a JGit issue. The problem is this one: how do I fetch a repo with *lots* of refs (e.g. 500k) without running out of file descriptors? In PackedBatchRefUpdate.lockLooseRefs() it simply loops through all the refs to be updated and create an individual lock file: that is the root cause of the issue. How was it working before? We were previously using the Git clone, which has a nice if/then/else that avoids locking all refs.
Thomas implemented special handling of clone in [1] which directly writes packed-refs and avoids locking altogether during clone. Hence this issue could be fixed by implementing option --mirror in CloneCommand to enable cloning a repository including all refs (also non-standard refs like refs/changes and refs/meta). In addition we may consider to automatically update packed-refs when a fetch updates many refs instead of writing loose refs which need to be locked one by one in case of loose refs. [1] https://git.eclipse.org/r/#/c/113412/
(In reply to Matthias Sohn from comment #1) > Thomas and Sasa! > implemented special handling of clone in [1] which directly writes > packed-refs and avoids locking altogether during clone. > > Hence this issue could be fixed by implementing option --mirror in > CloneCommand to enable cloning a repository including all refs (also > non-standard refs like refs/changes and refs/meta). Even then: is a subsequent fetch not likely to get many changed refs/changes? > In addition we may consider to automatically update packed-refs when a fetch > updates many refs instead of writing loose refs which need to be locked one > by one in case of loose refs. I think fetch uses BatchRefUpdate by default already. But IIRC that still locks all the loose refs of the refs to be updated, even if they don't exist; I think to prevent issues with concurrent updates by other processes. Otherwise it'd be possible that another process writes a loose ref for some of the refs to be updated while JGit updates the packed refs.
(In reply to Thomas Wolf from comment #2) > (In reply to Matthias Sohn from comment #1) > > Thomas > > and Sasa! > > > implemented special handling of clone in [1] which directly writes > > packed-refs and avoids locking altogether during clone. > > > > Hence this issue could be fixed by implementing option --mirror in > > CloneCommand to enable cloning a repository including all refs (also > > non-standard refs like refs/changes and refs/meta). > > Even then: is a subsequent fetch not likely to get many changed refs/changes? If the first fetch fails, the second fetch will get even more refs and will more likely to fail again than the first time. Once you get a successful fetch, then yes, the number of refs will be reduced afterwards. > > > In addition we may consider to automatically update packed-refs when a fetch > > updates many refs instead of writing loose refs which need to be locked one > > by one in case of loose refs. > > I think fetch uses BatchRefUpdate by default already. But IIRC that still > locks all the loose refs of the refs to be updated, even if they don't > exist; I think to prevent issues with concurrent updates by other processes. Sure, DBMS also have the same problem and typically they do a "lock escalation" to avoid creating hundreds of thousands of locks. We could detect that we are locking way too many refs and then just escalate to lock the ref-space instead of the individual ones. Example: Lock refs/changes/12/312/1 ... and many other 100k similar locks then you escalate the locks and lock the entire 'refs/changes/12/*' (one lock for all refs that start with refs/changes/12) ... if you still get to the locks overaload then you escalate to lock the repo. At the end of the day, if you are locking 100k refs of a repo, you're basically locking most of it :-) WDYT?
Does standard C git do such lock escalation?
(In reply to Thomas Wolf from comment #4) > Does standard C git do such lock escalation? I am not very familiar with the Git C implementation, however, I can say that this problem doesn't happen for me if I do the same operation using the Git CLI, but it happens using JGit.
Locking thousands of loose refs clearly won't scale. I'll check the C implementation, I think we need to implement the lock protocol followed by the canonical C implementation since we want to be interoperable. I think we should implement both clone --mirror option and a scalable locking protocol. As a first step I started implementing the mirror option for CloneCommand
New Gerrit change created: https://git.eclipse.org/r/151352
New Gerrit change created: https://git.eclipse.org/r/151353
Gerrit change https://git.eclipse.org/r/151352 was merged to [master]. Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=6216b0de8a829fa764b8c8c51095cf0c5964213f
Gerrit change https://git.eclipse.org/r/151353 was merged to [master]. Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=c24eee4fa4db682b138a7a212cf711ab682d02f6
CloneCommand and CLI Clone command support option --mirror now. Another option is to use RefTable which has better performance and a more sophisticated locking implementation. Can we close this bug ?
The fix added the --mirror option to the clone command, but the bug was on the fetch command. Was the code in common between the two? Has that also fixed the fetch use-case? Thanks @Matthias for the fix ! Luca.
This should be fixed via bug 573328 now?
(In reply to Thomas Wolf from comment #13) > This should be fixed via bug 573328 now? I verified that https://git.eclipse.org/r/c/jgit/jgit/+/180198 fixes this problem. With this patch I could fetch all refs of the gerrit repository (41k refs) with a ulimit maxfiles of 256 in 3:51min compared to 2:42min using cgit 2.31.0
Thanks everyone for fixing this bug, much appreciated !