Bug 563380 - Concurrent modification exception during pull
Summary: Concurrent modification exception during pull
Status: RESOLVED FIXED
Alias: None
Product: EGit
Classification: Technology
Component: Core (show other bugs)
Version: 5.8   Edit
Hardware: PC Windows 10
: P3 normal (vote)
Target Milestone: 5.9   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-20 03:13 EDT by Kenneth Styrberg CLA
Modified: 2020-08-10 04:16 EDT (History)
3 users (show)

See Also:


Attachments
full stacktrace (232.34 KB, application/pdf)
2020-06-22 03:46 EDT, Kenneth Styrberg CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kenneth Styrberg CLA 2020-05-20 03:13:51 EDT
Eclipse 2020-03

When pulling 30+ repositories using 8 parallell pulls. I get CME in org.eclipse.m2e.logback.appender. 

No stack trace available.

When run again I get no error. This is very hard to reproduce, typical once a day on different computers.

I also get a Warning: "ExceptionCaught(JGitClientSession[XXXX])[state=Opened] IOException: flush() Stream has been closed", but not at same time as the error above. This is maybe totally unrelated.
Comment 1 Matthias Sohn CLA 2020-05-20 03:41:44 EDT
org.eclipse.m2e.logback.appender looks like maven integration

Without a stack trace I don't know how to interpret the IOException.
Comment 2 Kenneth Styrberg CLA 2020-05-20 04:24:58 EDT
Yes I added it to m2e as the error seems to be from there.

Then I added the warning also, but as I said that seems unrelated to the error.

I think it should be moved back to m2e to investigate the error.
Comment 3 Mickael Istria CLA 2020-05-20 04:27:47 EDT
OK.
Please add more details, like a full stack trace, for the m2e error.
And also please try finding steps to reproduce that do not involve EGit.
Comment 4 Kenneth Styrberg CLA 2020-05-20 05:56:47 EDT
When IDE says "... stack trace not available.". Is there a hidden log containing a stack trace somewhere?
Comment 5 Mickael Istria CLA 2020-05-20 06:16:18 EDT
It depends on how lucky you are ;)
But usually, the <workspace>/.metadata/.log file contains some more details.
Comment 6 Kenneth Styrberg CLA 2020-05-20 06:41:29 EDT
After searching in .metadata folder I found this:

In .metadata/.plugins/org.eclipse.m2e.logback.configuration/0.log in see entries for the ConcurrentModificationException.

The text is "o.e.j.i.t.s.CachingKeyPairProvider - java.util.ConcurrentModificationException" 

The threads are:

"sshd-JGitSshClient[XXXXXXXX]-nio2-thread-X", where X is a number.

or sometimes 

"sshd-SshClient[XXXXXXXX]-nio2-thread-X"

No more information is in this log.
Comment 7 Eclipse Genie CLA 2020-05-20 10:14:38 EDT
New Gerrit change created: https://git.eclipse.org/r/163321
Comment 8 Eclipse Genie CLA 2020-05-20 22:49:12 EDT
Gerrit change https://git.eclipse.org/r/163321 was merged to [master].
Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=97e660e1a54b2bae4b5e830ad45c3d7b9ea9ea21
Comment 9 Matthias Sohn CLA 2020-05-21 05:24:07 EDT
update JGit and EGit to the latest nightly build [1] and try to reproduce the problem, now a stacktrace should be logged if you hit this again

[1] https://download.eclipse.org/egit/updates-nightly
Comment 10 Kenneth Styrberg CLA 2020-05-25 05:06:34 EDT
Ok, I run with Eclipse-2020-06M2 build and got 4 repository fails out of 30+.

As I cannot copy the whole stacktrace, I've manually typed some of the lines that I hope can give some more information.


The main exception is o.e.j.a.e.TransportException: "No more authentication methods available".

at o.e.j.a.FetchCommand.call(FetchCommand.java:222)
at o.e.j.a.PullCommand.call(PullCommand.java:263)
at o.e.e.c.o.PullOperation$PullJob.run(PullOperation.java:256)
..

Caused By: o.e.j.e.TransportException

at o.e.j.t.s.SshdSessionFactory.getSession(SshdSessionFactory.java:224)
at o.e.j.t.s.SshdSessionFactory.getSession(SshdSessionFactory.java:1)
...
at o.e.j.t.Transport.fetch(Transport.java:1260)
at o.e.j.a.FetchCommand.call(FetchCommand.java:211)

Casued By: org.apache.sshd.common.SshException:


at org.apache...
...
at org.eclipse.jgit.internal.transport.sshd.JGitClientSession.messageReceived(AbstractSessionIoHandler.java:63)
...
Comment 11 Kenneth Styrberg CLA 2020-05-25 05:11:37 EDT
Hmm, I wonder if the trace output commit came with M2 release? The other log didn't show any trace but the old "o.e.j.i.t.s.CachingKeyPairProvider - java.util.ConcurrentModificationException"
Comment 12 Kenneth Styrberg CLA 2020-06-02 02:20:20 EDT
I've tried for a couple of days with Eclipse 2020-06 M3 and I still can't reproduce the exception, so the root cause seems to have been fixed. 

So I propose to close the ticket for now, and re-open if it surfaces again.
Comment 13 Kenneth Styrberg CLA 2020-06-18 03:30:21 EDT
Oh well, first run with new 2020-06 release and the CME is back. I didn't see this with RC1 but that might be accidental struck of luck.

Pulling 34 repositories and 3 failed with CME.

As I cannot copy the whole stacktrace, I've manually typed some of the lines that I hope can give some more information.

java.util.ConcurrentModificationException
at java.util.HashMap.computeIfAbsent(HashMap:1134)
at org.eclipse.jgit.transport.sshd.IdentityPasswordProvider.getPassPhrase(IdentityPasswordProvider.java:134)
at org.eclipse.jgit.internal.transport.sshd.PasswordProviderWrapper.getPassword(PasswordProviderWrapper.java:60)
... org.apache.sshd.common.config lines ... 
at org.eclipse.jgit.internal.transport.sshd.CachingKeyPairProvider.loadKey(CachingKeyPairProvider.java:114)
... more org.eclipse.jgit.internal.transport.sshd.CachingKeyPairProvider lines ... :89, :81, :169
at org.eclipse.jgit.internal.transport.sshd.JGitSshClient$CombinedKeyIdentityProvider$1.hasNext(JGitSshClient.java:375)
... org.apache.sshd lines ...
at org.eclipse.jgit.internal.transport.sshd.CachingKeyPairProvider.loadKey(CachingKeyPairProvider.java:114)
Comment 14 Matthias Sohn CLA 2020-06-18 04:32:47 EDT
(In reply to Kenneth Styrberg from comment #13)
> Oh well, first run with new 2020-06 release and the CME is back. I didn't

Does this mean the failing code uses JGit 5.8.0.202006091008-r ?
You initiated parallel pull via egit or is this done via m2e ? I don't use m2e a lot so I don't know what it does.

> see this with RC1 but that might be accidental struck of luck.
> 
> Pulling 34 repositories and 3 failed with CME.
> 
> As I cannot copy the whole stacktrace, I've manually typed some of the lines
> that I hope can give some more information.
> 
> java.util.ConcurrentModificationException
> at java.util.HashMap.computeIfAbsent(HashMap:1134)
> at
> org.eclipse.jgit.transport.sshd.IdentityPasswordProvider.
> getPassPhrase(IdentityPasswordProvider.java:134)
> at
> org.eclipse.jgit.internal.transport.sshd.PasswordProviderWrapper.
> getPassword(PasswordProviderWrapper.java:60)
> ... org.apache.sshd.common.config lines ... 
> at
> org.eclipse.jgit.internal.transport.sshd.CachingKeyPairProvider.
> loadKey(CachingKeyPairProvider.java:114)
> ... more org.eclipse.jgit.internal.transport.sshd.CachingKeyPairProvider
> lines ... :89, :81, :169
> at
> org.eclipse.jgit.internal.transport.sshd.
> JGitSshClient$CombinedKeyIdentityProvider$1.hasNext(JGitSshClient.java:375)
> ... org.apache.sshd lines ...
> at
> org.eclipse.jgit.internal.transport.sshd.CachingKeyPairProvider.
> loadKey(CachingKeyPairProvider.java:114)
Comment 15 Kenneth Styrberg CLA 2020-06-18 06:21:44 EDT
(In reply to Matthias Sohn from comment #14)
> (In reply to Kenneth Styrberg from comment #13)
> > Oh well, first run with new 2020-06 release and the CME is back. I didn't
> 
> Does this mean the failing code uses JGit 5.8.0.202006091008-r ?
> You initiated parallel pull via egit or is this done via m2e ? I don't use
> m2e a lot so I don't know what it does.
> 
Yes it's that version. And I pulled using "Git Repositories" view, I assume it's EGit.

Another note is that it seems to throw when I have many changes in the repoisitories. I cannot reproduce on freshly pulled repositories.
Comment 16 Matthias Sohn CLA 2020-06-18 06:27:05 EDT
(In reply to Kenneth Styrberg from comment #15)
> (In reply to Matthias Sohn from comment #14)
> > (In reply to Kenneth Styrberg from comment #13)
> > > Oh well, first run with new 2020-06 release and the CME is back. I didn't
> > 
> > Does this mean the failing code uses JGit 5.8.0.202006091008-r ?
> > You initiated parallel pull via egit or is this done via m2e ? I don't use
> > m2e a lot so I don't know what it does.
> > 
> Yes it's that version. And I pulled using "Git Repositories" view, I assume
> it's EGit.

then I'd say this bug should be moved to egit or is there another reason why this is on m2e ?

> Another note is that it seems to throw when I have many changes in the
> repoisitories. I cannot reproduce on freshly pulled repositories.

What do you mean by you have many changes ? Don't the freshly pulled repositories also contain many commits ?
Comment 17 Kenneth Styrberg CLA 2020-06-18 06:46:17 EDT
> What do you mean by you have many changes ? Don't the freshly pulled
> repositories also contain many commits ?

Yes commits they have, but as I pull from server there might be new commits that's pulled to my local repositories. Maybe my wording is confusing?

And yes, I think this should belong to EGit/JGit.
Comment 18 Thomas Wolf CLA 2020-06-18 17:42:02 EDT
(In reply to Kenneth Styrberg from comment #13)

> java.util.ConcurrentModificationException
> at java.util.HashMap.computeIfAbsent(HashMap:1134)
> at
> org.eclipse.jgit.transport.sshd.IdentityPasswordProvider.
> getPassPhrase(IdentityPasswordProvider.java:134)
Yes, we have to figure out something to avoid that different sessions share state in this IdentityPasswordProvider. Just using a ConcurrentHashMap will not be good enough. Will have to think about how to refactor this without breaking API.
Comment 19 Thomas Wolf CLA 2020-06-18 17:53:15 EDT
Though it is a bit strange. Each session should already get its own IdentityPasswordProvider. I'll have to check how it could be possible that two sshd nio2 threads belonging to the same session might call this provider concurrently.
Comment 20 Thomas Wolf CLA 2020-06-19 04:27:58 EDT
(In reply to Kenneth Styrberg from comment #6)
> The threads are:
> 
> "sshd-JGitSshClient[XXXXXXXX]-nio2-thread-X", where X is a number.
> 
> or sometimes 
> 
> "sshd-SshClient[XXXXXXXX]-nio2-thread-X"

The latter is _very_ strange. We're only using JGitSshClient, never SshClient. I have no idea where that might come from.
Comment 21 Thomas Wolf CLA 2020-06-19 15:59:01 EDT
(In reply to Thomas Wolf from comment #19)
> Though it is a bit strange. Each session should already get its own
> IdentityPasswordProvider. I'll have to check how it could be possible that
> two sshd nio2 threads belonging to the same session might call this provider
> concurrently.

I don't see this from code inspection of JGit or sshd code, and I cannot reproduce this problem. It shouldn't have anything to do with the amount of data fetched -- a CME at this place means it occurs during authentication, which occurs at the beginning of the ssh connection. There is no re-authentication in the SSH protocol.

Something must go wrong such that either two nio2 threads for the same session run the SSH authentication sub-protocol simultaneously, or (less likely) two nio2 threads for different sessions (i.e., different pulls) perform this sub-protocol simultaneously, but one of these nio2 threads somehow is connected to the wrong session and end  up using the wrong IdentityPasswordProvider. (The IdentityPasswordProvider provides the passphrase for an encrypted SSH private key in the publickey authentication method.)

I see nothing that would cause either. I can't make any progress here without a full stack trace and preferably also a full log containing sshd debug-level logging. (But only if that log includes some thread ID that would allow to figure out which log entry comes from which thread.) Perhaps I can then piece together some understanding as to why this happens at all.

Absent such information the only thing we could do is use a ConcurrentHashMap and hope that makes the CME go away. But I'm worried that this would just mask a much deeper problem. A concurrent map should not be needed there.
Comment 22 Kenneth Styrberg CLA 2020-06-22 03:46:02 EDT
Created attachment 283369 [details]
full stacktrace

Here is a stack trace. The only way I could get it, was as a PDF.
Comment 23 Eclipse Genie CLA 2020-07-26 14:46:44 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/166867