Bug 566170 - Docker: Deadlock when JGit config is in a different drive
Summary: Docker: Deadlock when JGit config is in a different drive
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 5.6   Edit
Hardware: All Unix All
: P3 normal (vote)
Target Milestone: 5.9   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-18 16:35 EDT by Kaspar von Gunten CLA
Modified: 2022-03-26 10:37 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kaspar von Gunten CLA 2020-08-18 16:35:21 EDT
I'm opening this bug on behalf of https://www.eclipse.org/forums/index.php/t/1103395/, originally described in https://community.sonarsource.com/t/error-locking-filebasedconfig-root-config-jgit-config-failed-after-5-retries/21899/13

We observed the bug when using git-commit-id-plugin:4.0.2 with docker.

Here's the original bug description of Duarte Meneses (Sonar Source). I am not sure if it has already been resolved.

---

JGit version

v5.6.1.202002131546-r

Description

If a git repository and JGit’s configuration file are located in different drives, JGit will fail to save the filesystem’s attributes when it runs for the first time.
The problem is that there is a deadlock involving the configuration file’s lock and the generation of both filesystem’s attributes.

Here is how the deadlock happens:

JGit tries to create a FileSnapshot for the repository’s index
Doesn’t have the FS attributes saved for that drive, so generates it
Proceeds to save it in the configuration file located in the other drive
Next it creates a lock file for the configuration file
Tries to create a FileSnapshot for the lock file.
Repeates steps 2-4 for the attributes of the other drive, and step 4 fails because the lock file already exists.

Impact

Note that it still manages to save the filesystem attributes for 1 of the drives, so next time it runs it will successfully save the attributes for the other drive. A warning is logged but it shouldn’t directly impact applications using the library.

Using 2 drives might sound like a corner case but turns out that CI pipelines with Atlassian and GitLab have this kind of setup in their worker nodes.

Reproducer

You’ll need 2 drives, let’s call them /d1 and /d2.

Place a clone in /d1, let’s say /d1/clone.
Set the environment variable XDG_CONFIG_HOME=/d2
Run the following.
  @Test
>   public void reproduce() throws IOException {
>     try (Repository repo = new RepositoryBuilder()
>       .findGitDir(baseDir.toFile())
>       .setMustExist(true).build()) {
>     }
>   }

Workaround

Setting FS.setAsyncFileStoreAttributes(true) breaks the deadlock loop.
Comment 1 Thomas Wolf CLA 2020-08-20 14:48:03 EDT
Two problems here:

1. Saving to the jgitconfig must happen asynchronously in the background.
   That's safe because the internal cache per FileSystem is already updated.

2. There's an additional cache per directory. If the resolution determination
   runs asynchronously and doesn't deliver its result within 100ms, JGit stores
   the fallback resolution (2sec) there. That value is then never updated when
   the result arrives. Until the next restart, 2s remain in effect for any
   directory JGit tried to access before it had the real value.
Comment 2 Thomas Wolf CLA 2020-08-20 17:17:58 EDT
And a third problem: JGit 5.8.1 doesn't write the JGit config at all.
Comment 3 Thomas Wolf CLA 2020-08-20 18:01:40 EDT
Findings so far:

* JGit 5.8.1 and 5.9.0 reproduce the deadlock. (With
  FS.setAsyncFileStoreAttributes(false).) But in my tests, it's the nested
  blocking call to the future's get() that simply parks the thread executing
  the outer future, but never starts a new thread for the nested future.

* JGit 5.8.1 and 5.9.0 not writing the config at all can occur because we use
  daemon threads. If the JVM shuts down before these threads have written the
  file, we may be left with no file at all, or even worse, only the lock file.

* Writing asynchronously does resolve the deadlock, as expected. Doesn't solve
  the above problem that no file or only the lock file gets written if JVM
  shutdown comes first, though.

* Using non-daemon threads delays JVM shutdown until there are no executor
  threads anymore, i.e., there's a delay of 30 seconds. But it reliably writes
  the file, and it correctly contains data about both file systems.

So it looks like: using daemon threads for determining the resolution is OK. Writing, however, must not be done in daemon threads as it may leave us with an inconsistent file state. Writing asynchronously is OK, but I'll have to think a bit more about how exactly to do it to not get that 30 seconds delay on shutdown.
Comment 4 Eclipse Genie CLA 2020-08-21 03:03:30 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/168041
Comment 5 Eclipse Genie CLA 2020-08-21 03:03:32 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/168042
Comment 8 James Gao CLA 2022-03-26 04:10:13 EDT
This bug still occurs randomly on 5.9 to 5.13 when the FS is somewhat slow, so the git commit maven plugin almost always produces the following warning in a docker container:

> [WARNING] locking FileBasedConfig[/root/.config/jgit/config] failed after 5 retries
Comment 9 Thomas Wolf CLA 2022-03-26 10:37:33 EDT
(In reply to James Gao from comment #8)
> This bug still occurs randomly on 5.9 to 5.13 when the FS is somewhat slow,
> so the git commit maven plugin almost always produces the following warning
> in a docker container:
> 
> > [WARNING] locking FileBasedConfig[/root/.config/jgit/config] failed after 5 retries

This would be a separate issue. It's not the deadlock that was reported here originally.