552173 – Too many open files when fetching repos with lots of refs

Bug 552173 - Too many open files when fetching repos with lots of refs

Summary: Too many open files when fetching repos with lots of refs

Status:	RESOLVED FIXED

Alias:	None

Product:	JGit
Classification:	Technology
Component:	JGit (show other bugs)
Version:	5.3.5
Hardware:	PC Mac OS X

Importance:	P3 major (vote)
Target Milestone:	5.12
Assignee:	Project Inbox
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2019-10-17 07:43 EDT by Luca Milanesio
Modified:	2021-05-07 19:57 EDT (History)
CC List:	3 users (show)

See Also:	Gerrit Change Gerrit Change Git Commit Git Commit 573328 Gerrit Change

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luca Milanesio

2019-10-17 07:43:54 EDT

I have raised a Gerrit issue on the GitHub plugin (https://bugs.chromium.org/p/gerrit/issues/detail?id=11763) which is actually a JGit issue.

The problem is this one: how do I fetch a repo with *lots* of refs (e.g. 500k) without running out of file descriptors?

In PackedBatchRefUpdate.lockLooseRefs() it simply loops through all the refs to be updated and create an individual lock file: that is the root cause of the issue.

How was it working before? We were previously using the Git clone, which has a nice if/then/else that avoids locking all refs.

Comment 1 Matthias Sohn

2019-10-17 19:09:44 EDT

Thomas implemented special handling of clone in [1] which directly writes packed-refs and avoids locking altogether during clone.

Hence this issue could be fixed by implementing option --mirror in CloneCommand to enable cloning a repository including all refs (also non-standard refs like refs/changes and refs/meta).

In addition we may consider to automatically update packed-refs when a fetch updates many refs instead of writing loose refs which need to be locked one by one in case of loose refs. 

[1] https://git.eclipse.org/r/#/c/113412/

Comment 2 Thomas Wolf

2019-10-18 10:20:17 EDT

(In reply to Matthias Sohn from comment #1)
> Thomas

and Sasa!

> implemented special handling of clone in [1] which directly writes
> packed-refs and avoids locking altogether during clone.
> 
> Hence this issue could be fixed by implementing option --mirror in
> CloneCommand to enable cloning a repository including all refs (also
> non-standard refs like refs/changes and refs/meta).

Even then: is a subsequent fetch not likely to get many changed refs/changes?

> In addition we may consider to automatically update packed-refs when a fetch
> updates many refs instead of writing loose refs which need to be locked one
> by one in case of loose refs. 

I think fetch uses BatchRefUpdate by default already. But IIRC that still locks all the loose refs of the refs to be updated, even if they don't exist; I think to prevent issues with concurrent updates by other processes. Otherwise it'd be possible that another process writes a loose ref for some of the refs to be updated while JGit updates the packed refs.

Comment 3 Luca Milanesio

2019-10-18 10:40:23 EDT

(In reply to Thomas Wolf from comment #2)
> (In reply to Matthias Sohn from comment #1)
> > Thomas
> 
> and Sasa!
> 
> > implemented special handling of clone in [1] which directly writes
> > packed-refs and avoids locking altogether during clone.
> > 
> > Hence this issue could be fixed by implementing option --mirror in
> > CloneCommand to enable cloning a repository including all refs (also
> > non-standard refs like refs/changes and refs/meta).
> 
> Even then: is a subsequent fetch not likely to get many changed refs/changes?

If the first fetch fails, the second fetch will get even more refs and will more likely to fail again than the first time.

Once you get a successful fetch, then yes, the number of refs will be reduced afterwards.

> 
> > In addition we may consider to automatically update packed-refs when a fetch
> > updates many refs instead of writing loose refs which need to be locked one
> > by one in case of loose refs. 
> 
> I think fetch uses BatchRefUpdate by default already. But IIRC that still
> locks all the loose refs of the refs to be updated, even if they don't
> exist; I think to prevent issues with concurrent updates by other processes.

Sure, DBMS also have the same problem and typically they do a "lock escalation" to avoid creating hundreds of thousands of locks.

We could detect that we are locking way too many refs and then just escalate to lock the ref-space instead of the individual ones.

Example:

Lock refs/changes/12/312/1
... and many other 100k similar locks

then you escalate the locks and lock the entire 'refs/changes/12/*'
(one lock for all refs that start with refs/changes/12)

... if you still get to the locks overaload

then you escalate to lock the repo.

At the end of the day, if you are locking 100k refs of a repo, you're basically locking most of it :-)

WDYT?

Comment 4 Thomas Wolf

2019-10-18 10:44:06 EDT

Does standard C git do such lock escalation?

Comment 5 Luca Milanesio

2019-10-18 10:45:37 EDT

(In reply to Thomas Wolf from comment #4)
> Does standard C git do such lock escalation?

I am not very familiar with the Git C implementation, however, I can say that this problem doesn't happen for me if I do the same operation using the Git CLI, but it happens using JGit.

Comment 6 Matthias Sohn

2019-10-18 16:36:19 EDT

Locking thousands of loose refs clearly won't scale.
I'll check the C implementation, I think we need to implement the lock protocol followed by the canonical C implementation since we want to be interoperable.

I think we should implement both clone --mirror option and a scalable locking protocol. As a first step I started implementing the mirror option for CloneCommand

Comment 7 Eclipse Genie

2019-10-20 12:39:41 EDT

New Gerrit change created: https://git.eclipse.org/r/151352

Comment 8 Eclipse Genie

2019-10-20 12:39:43 EDT

New Gerrit change created: https://git.eclipse.org/r/151353

Comment 9 Eclipse Genie

2019-10-21 05:20:03 EDT

Gerrit change https://git.eclipse.org/r/151352 was merged to [master].
Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=6216b0de8a829fa764b8c8c51095cf0c5964213f

Comment 10 Eclipse Genie

2019-10-21 05:20:05 EDT

Gerrit change https://git.eclipse.org/r/151353 was merged to [master].
Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=c24eee4fa4db682b138a7a212cf711ab682d02f6

Comment 11 Matthias Sohn

2021-01-24 17:22:02 EST

CloneCommand and CLI Clone command support option --mirror now.
Another option is to use RefTable which has better performance and a more sophisticated locking implementation.

Can we close this bug ?

Comment 12 Luca Milanesio

2021-01-25 10:04:50 EST

The fix added the --mirror option to the clone command, but the bug was on the fetch command.

Was the code in common between the two? Has that also fixed the fetch use-case?

Thanks @Matthias for the fix !
Luca.

Comment 13 Thomas Wolf

2021-05-07 17:34:58 EDT

This should be fixed via bug 573328 now?

Comment 14 Matthias Sohn

2021-05-07 18:04:43 EDT

(In reply to Thomas Wolf from comment #13)
> This should be fixed via bug 573328 now?

I verified that
https://git.eclipse.org/r/c/jgit/jgit/+/180198
fixes this problem.

With this patch I could fetch all refs of the gerrit repository (41k refs) with a ulimit maxfiles of 256 in 3:51min compared to 2:42min using cgit 2.31.0

Comment 15 Luca Milanesio

2021-05-07 19:57:43 EDT

Thanks everyone for fixing this bug, much appreciated !