Bug 552173 - Too many open files when fetching repos with lots of refs
Summary: Too many open files when fetching repos with lots of refs
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 5.3.5   Edit
Hardware: PC Mac OS X
: P3 major (vote)
Target Milestone: 5.12   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-17 07:43 EDT by Luca Milanesio CLA
Modified: 2021-05-07 19:57 EDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luca Milanesio CLA 2019-10-17 07:43:54 EDT
I have raised a Gerrit issue on the GitHub plugin (https://bugs.chromium.org/p/gerrit/issues/detail?id=11763) which is actually a JGit issue.

The problem is this one: how do I fetch a repo with *lots* of refs (e.g. 500k) without running out of file descriptors?

In PackedBatchRefUpdate.lockLooseRefs() it simply loops through all the refs to be updated and create an individual lock file: that is the root cause of the issue.

How was it working before? We were previously using the Git clone, which has a nice if/then/else that avoids locking all refs.
Comment 1 Matthias Sohn CLA 2019-10-17 19:09:44 EDT
Thomas implemented special handling of clone in [1] which directly writes packed-refs and avoids locking altogether during clone.

Hence this issue could be fixed by implementing option --mirror in CloneCommand to enable cloning a repository including all refs (also non-standard refs like refs/changes and refs/meta).

In addition we may consider to automatically update packed-refs when a fetch updates many refs instead of writing loose refs which need to be locked one by one in case of loose refs. 

[1] https://git.eclipse.org/r/#/c/113412/
Comment 2 Thomas Wolf CLA 2019-10-18 10:20:17 EDT
(In reply to Matthias Sohn from comment #1)
> Thomas

and Sasa!

> implemented special handling of clone in [1] which directly writes
> packed-refs and avoids locking altogether during clone.
> 
> Hence this issue could be fixed by implementing option --mirror in
> CloneCommand to enable cloning a repository including all refs (also
> non-standard refs like refs/changes and refs/meta).

Even then: is a subsequent fetch not likely to get many changed refs/changes?

> In addition we may consider to automatically update packed-refs when a fetch
> updates many refs instead of writing loose refs which need to be locked one
> by one in case of loose refs. 

I think fetch uses BatchRefUpdate by default already. But IIRC that still locks all the loose refs of the refs to be updated, even if they don't exist; I think to prevent issues with concurrent updates by other processes. Otherwise it'd be possible that another process writes a loose ref for some of the refs to be updated while JGit updates the packed refs.
Comment 3 Luca Milanesio CLA 2019-10-18 10:40:23 EDT
(In reply to Thomas Wolf from comment #2)
> (In reply to Matthias Sohn from comment #1)
> > Thomas
> 
> and Sasa!
> 
> > implemented special handling of clone in [1] which directly writes
> > packed-refs and avoids locking altogether during clone.
> > 
> > Hence this issue could be fixed by implementing option --mirror in
> > CloneCommand to enable cloning a repository including all refs (also
> > non-standard refs like refs/changes and refs/meta).
> 
> Even then: is a subsequent fetch not likely to get many changed refs/changes?

If the first fetch fails, the second fetch will get even more refs and will more likely to fail again than the first time.

Once you get a successful fetch, then yes, the number of refs will be reduced afterwards.

> 
> > In addition we may consider to automatically update packed-refs when a fetch
> > updates many refs instead of writing loose refs which need to be locked one
> > by one in case of loose refs. 
> 
> I think fetch uses BatchRefUpdate by default already. But IIRC that still
> locks all the loose refs of the refs to be updated, even if they don't
> exist; I think to prevent issues with concurrent updates by other processes.

Sure, DBMS also have the same problem and typically they do a "lock escalation" to avoid creating hundreds of thousands of locks.

We could detect that we are locking way too many refs and then just escalate to lock the ref-space instead of the individual ones.

Example:

Lock refs/changes/12/312/1
... and many other 100k similar locks

then you escalate the locks and lock the entire 'refs/changes/12/*'
(one lock for all refs that start with refs/changes/12)

... if you still get to the locks overaload

then you escalate to lock the repo.

At the end of the day, if you are locking 100k refs of a repo, you're basically locking most of it :-)

WDYT?
Comment 4 Thomas Wolf CLA 2019-10-18 10:44:06 EDT
Does standard C git do such lock escalation?
Comment 5 Luca Milanesio CLA 2019-10-18 10:45:37 EDT
(In reply to Thomas Wolf from comment #4)
> Does standard C git do such lock escalation?

I am not very familiar with the Git C implementation, however, I can say that this problem doesn't happen for me if I do the same operation using the Git CLI, but it happens using JGit.
Comment 6 Matthias Sohn CLA 2019-10-18 16:36:19 EDT
Locking thousands of loose refs clearly won't scale.
I'll check the C implementation, I think we need to implement the lock protocol followed by the canonical C implementation since we want to be interoperable.

I think we should implement both clone --mirror option and a scalable locking protocol. As a first step I started implementing the mirror option for CloneCommand
Comment 7 Eclipse Genie CLA 2019-10-20 12:39:41 EDT
New Gerrit change created: https://git.eclipse.org/r/151352
Comment 8 Eclipse Genie CLA 2019-10-20 12:39:43 EDT
New Gerrit change created: https://git.eclipse.org/r/151353
Comment 9 Eclipse Genie CLA 2019-10-21 05:20:03 EDT
Gerrit change https://git.eclipse.org/r/151352 was merged to [master].
Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=6216b0de8a829fa764b8c8c51095cf0c5964213f
Comment 10 Eclipse Genie CLA 2019-10-21 05:20:05 EDT
Gerrit change https://git.eclipse.org/r/151353 was merged to [master].
Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=c24eee4fa4db682b138a7a212cf711ab682d02f6
Comment 11 Matthias Sohn CLA 2021-01-24 17:22:02 EST
CloneCommand and CLI Clone command support option --mirror now.
Another option is to use RefTable which has better performance and a more sophisticated locking implementation.

Can we close this bug ?
Comment 12 Luca Milanesio CLA 2021-01-25 10:04:50 EST
The fix added the --mirror option to the clone command, but the bug was on the fetch command.

Was the code in common between the two? Has that also fixed the fetch use-case?

Thanks @Matthias for the fix !
Luca.
Comment 13 Thomas Wolf CLA 2021-05-07 17:34:58 EDT
This should be fixed via bug 573328 now?
Comment 14 Matthias Sohn CLA 2021-05-07 18:04:43 EDT
(In reply to Thomas Wolf from comment #13)
> This should be fixed via bug 573328 now?

I verified that
https://git.eclipse.org/r/c/jgit/jgit/+/180198
fixes this problem.

With this patch I could fetch all refs of the gerrit repository (41k refs) with a ulimit maxfiles of 256 in 3:51min compared to 2:42min using cgit 2.31.0
Comment 15 Luca Milanesio CLA 2021-05-07 19:57:43 EDT
Thanks everyone for fixing this bug, much appreciated !