Bug 456996 - Having .gitconfig file on a network disk makes Egit performance very slow
Summary: Having .gitconfig file on a network disk makes Egit performance very slow
Status: RESOLVED FIXED
Alias: None
Product: EGit
Classification: Technology
Component: Core (show other bugs)
Version: 3.6   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: 5.6   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-08 04:05 EST by Roger Oberg CLA
Modified: 2019-11-17 17:45 EST (History)
8 users (show)

See Also:


Attachments
all logs from eclipse process (10.57 MB, application/octet-stream)
2015-01-08 09:51 EST, Roger Oberg CLA
no flags Details
only access to gitconfig (1.64 MB, application/octet-stream)
2015-01-08 09:52 EST, Roger Oberg CLA
no flags Details
tracing-startup.txt (17.71 KB, text/plain)
2019-09-26 12:43 EDT, Marc R. Hoffmann CLA
no flags Details
tracing-contextmenu.txt (34.87 KB, text/plain)
2019-09-26 12:43 EDT, Marc R. Hoffmann CLA
no flags Details
20190927-tracing-startup.txt (17.49 KB, text/plain)
2019-09-27 08:50 EDT, Marc R. Hoffmann CLA
no flags Details
20190927-tracing-contextmenu.txt (14.97 KB, text/plain)
2019-09-27 08:51 EDT, Marc R. Hoffmann CLA
no flags Details
20190927-tracing-openjavaperspective.txt (9.33 KB, text/plain)
2019-09-27 09:06 EDT, Marc R. Hoffmann CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Roger Oberg CLA 2015-01-08 04:05:02 EST
Hi!
In our environment when running Egit in eclipse(Kepler), it defaults the configuration file (.gitconfig) to our network disk (H:)  . (The Location path in git configuration)

However our git repos, workspaces etc. are stored locally on our SSD:s (C:).


Surprisingly this led to that a single operation (e.g. editing a file) took about 30 seconds before label decorations was updated in Eclipse!
After we changed the Location of the .gitconfig file to our local C: directory the same operation took about 2-3 seconds!

At first we thoght that it was some other issues which slowed down our git operations, but since we moved the file it works really good now.

The file itself doesn´t contain that much data, so I think it should be a good idea to cache it´s data instead of reading it alot(if that´s what is causing the performance problem).
Comment 1 Christian Halstrick CLA 2015-01-08 07:04:10 EST
In general this file $HOME/.gitconfig can be changed at any time also by tools outside of eclipse. We have to check the gitconfig file everytime before we do git operations. Nearly all (all?) git operations are potentially influenced by the content of this file. If accessing this file is sooo slow as in your case then maybe even caching doesn't bring much. We would at least check whether the file has been modified since it was cached and I guess even that operation of checking the modification data is slow on your share. 

What did you do the "change the location of the .gitconfig file"? Have you set HOME or HOMEDRIVE/HOMEPATH or HOMESHARE?
Comment 2 Roger Oberg CLA 2015-01-08 07:57:25 EST
I changed the location by setting the environment HOME to a path on the C-drive. 

The network drive is not dead slow, but I guess you read the file so often  (since our git repo is quite large) it takes long time altogether.

At least I could live with that not Every operation which is performed by Egit checks with the .gitconfig first. I mean the file is more or less static after the initial setup. 

In my world, this would be enough (since you are working in Eclipse primarly!)

* At eclipse startup read the gitconfig file to cache
* When starting git configuration in eclipse, reread the file to cache, and update file/cache when adding new entrys.
* Add a button to update the content from file to cache (if updated outside of eclipse)

I also did a small test with Process Monitor, and editing a single file in the workspace, caused 1347 access entries to the gitconfig file!


It would be interesting to see if the performance could be even better when using a cache instead!
Comment 3 Christian Halstrick CLA 2015-01-08 08:43:45 EST
I checked again our code and we do already cache the content of the $HOME/.gitconfig file. We cache it in memory and whenever we need config data we check whether the file was modified since it was cached (using the lastmodified metadata for the file). See [1]. We save reading and especially parsing the file every time. But in your case I guess even checking that metadata is so slow that 1000 checks cause this performance decrease. That would mean that checking 1000 for the lastmodified of a file costs 30 secs. If the most read file in a git environment is put on such a slow filesystem I would expect bad performance.

I am not sure whether we should workaround this problem by flushing our caches only when the user manually hits a "reload" button. I think we should find the root cause: why do we access $HOME/.gitconfig >1000 times if you edit a file (really only edit or do you trigger git operations like "git add"/"git commit"). Sounds like there is a bug in EGit/JGit causing sooo many accesses to .gitconfig. I'll check whether I can reproduce this locally.

[1] https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/FileRepository.java#L385-L391
Comment 4 Matthias Sohn CLA 2015-01-08 09:11:04 EST
Could you retry with the latest EGit nightly build ?

https://git.eclipse.org/r/#/c/38860/ may help here. Earlier we potentially triggered a large number of unnecessary index updates depending on the number of resource change events which may also add to the number of accesses to .gitconfig
Comment 5 Roger Oberg CLA 2015-01-08 09:51:27 EST
Created attachment 249785 [details]
all logs from eclipse process
Comment 6 Roger Oberg CLA 2015-01-08 09:52:33 EST
Created attachment 249786 [details]
only access to gitconfig
Comment 7 Roger Oberg CLA 2015-01-08 09:52:45 EST
I am only editing and saving a single file (no git operations is performed). 
(for information, I do have a quite large repo with ~500 projects from 5-6 different git repos)

I tried the nightly build, but it seems to be the same.

I have attached the process monitor logs if that can help you
Comment 8 Matthias Sohn CLA 2015-01-08 09:59:28 EST
Saving a file triggers a resource change event so it could trigger unnecessary recomputation of git status with older egit version. That's why I asked you to try the nightly.
Comment 9 Matthias Sohn CLA 2015-01-08 10:02:17 EST
I can't read the logs (I am on Mac)
Comment 10 Roger Oberg CLA 2015-01-08 10:18:36 EST
(In reply to Matthias Sohn from comment #9)
> I can't read the logs (I am on Mac)

Ok. The program can be downloaded here http://technet.microsoft.com/sv-se/sysinternals/bb896645.aspx
But you need windows to run it. Perhaps you can run it in a virtual environment?
Comment 11 Christian Halstrick CLA 2015-01-08 10:20:09 EST
I can read the logs. I'll try to reproduce and investigate with Matthias (he's sitting one room away :-) )
Comment 12 Christian Halstrick CLA 2015-01-08 10:44:40 EST
I was investigating Logfile_only_gitconfig.PML. In that file I see 714 file system related events. They can be grouped into exactly 238 groups of:

  1. CreateFile
  2. QueryBasicInformationFile
  3. CloseFile

I think that's the typical outcome for checking 238 times the outcome of $HOME/.gitconfig. Why this takes 30 secs on your filesystem is another question. In an environment where we have 500 projects from 5 repos that seems to be too much ... but not as dramatic as I thought before. Repos are isolated from each other and every repo has it's own cache of $HOME/.gitconfig. So, basically we are checking the lastmodification time of the $HOME/.gitconfig file 40 times. Still too much. Investigate further....
Comment 13 Roger Oberg CLA 2015-01-08 11:58:24 EST
It depends somewhat how many times it accesses the gitconfig file. In some runs it is 1500 times which gives 500 groups. And each time it accesses the config file on the network disk it takes around 60ms which ends up with the total time of 30 seconds.

It seems like the gitconfig file is accessed for each project in the workspace (and a lot of other git-files are accessed over and over again)

Another example, *\platform\.git\HEAD is accessed 635 times which is 16ms altogether, that together with a lot of other file accesses and the total time is around 2-3 seconds (on a SSD-disk), just for saving a single file? Is that really nescessary?


It looks like you could do some opitmisation there I guess.
Comment 14 Christian Halstrick CLA 2015-01-14 11:48:12 EST
I instrumented E/JGit to inspect this problem furter. Here is what I found out so far:

- In the moment you save new content E/JGit is not involved. Saving files is nothing E/JGit is interested in.
- But directly afterwards eclipse triggers the redecoration of all the visible resources. That is what is so costly in your case. Proof: If you turn of Label decorations for git by going to "Preferences: General->Appearance->Label Decorations" and untick "Git" then everythin should be fast. Can you validate this?
- Since you have 500 open projects we are asked at least 500 times to decorate a project. Project decoration of course requires that we access $HOME/.gitconfig (e.g. core.autocrlf influences whether a project is dirty or not).  We do cache .gitconfig in memory for every repo, but we check with a stat() call whether our cached data is still valid. With normal filesystems that's no problem. But in your case a stat() is extremely expensive. 

Next steps: I still don't know why eclipse asks us to decorate all 500 projects when I save a resource in only one project. I'll try to find that out.
Comment 15 Roger Oberg CLA 2015-01-15 08:29:53 EST
I can confirm that having label decorarions off, then everything is very fast!
Comment 16 Michael Keppler CLA 2015-03-17 02:53:46 EDT
I may not be able to contribute to the solution, but I can describe the symptoms of this issue for me:

I'm using egit 3.7 on a fast laptop and egit always works totally fast, with HOME not being defined, and my user home sitting on a network share of our company domain network (all of that in Germany).

Now I'm in Vietnam (with a bandwidth limited company network connection), and without further changes egit is totally slow (while the remaining eclipse and laptop remain fast). Just opening the team menu can take 5 seconds, dragging a file in the staging view to the staging area may take 20 seconds to refresh and so on. Eclipse is randomly frozen for several seconds without me doing an explicit command before.

Since I set HOME to point to a local directory, everything is fine again. But it took me several days to figure out the root cause, even though I know the general egit "HOME is undefined" warning. Therefore if you can do anything to somehow warn the user more strongly about this problem, that would be quite nice.
Comment 17 Andrey Loskutov CLA 2015-03-17 03:21:03 EDT
(In reply to Christian Halstrick from comment #14)
> Next steps: I still don't know why eclipse asks us to decorate all 500
> projects when I save a resource in only one project. I'll try to find that
> out.

GitLightweightDecorator.indexDiffChanged() triggers redecoration of all resources. It *could* be more clever if IndexDiffCacheEntry would supply delta information in notifyListeners() call. The delta is usually (not always!) available - if the notifyListeners() was called from IndexDiffUpdateJob.

But it can be that I've per occasion (partly) fixed that issue: see commit http://git.eclipse.org/c/egit/egit.git/commit?id=c5555627280d55f89a2150f0f0c7ee2e2209697c
This avoids reading BranchTrackingStatus.of() for each project, and so eventually saves us some calls to the configuration?
Comment 18 Christian Halstrick CLA 2015-03-17 03:38:54 EDT
Andrey, it's true that we could enhance GitLightweightDecorator and friends. But if I remember correctly in this specific case we already investigated with OS-level trace files that we spent most of the time to read $HOME/.gitconfig. And that's because the $HOME directory is on a remote filesystem where every access is verry slow.

Michael: yes, good idea, we could give a warning. When EGit comes up it tries to access $HOME/.gitconfig (it does it anyhow) and if the access time >1s then whe could emit a warning.

BTW: I would guess that even on a Windows box a lot of things are slow if the $HOME is on a filesystem which has multisecond access times. maven, native git's bash (.bashrc), vim (.vimrc), curl (.netrc), putty/ssh (.ssh/) ... all these tools will access $HOME and will be slow is $HOME is far away.
Comment 19 Marc R. Hoffmann CLA 2019-09-26 12:42:30 EDT
Hi, I'm currently trying to improve the user experience for a large group of developers which use Eclipse on a daily base. One of the issues are UI freezes (sometimes for several minutes) caused by EGit accessing .gitconfig. My analyis revealed that the cause is a combination two problems:

1) The Windows machines in the organisation define HOMEDRIVE=H: and HOMEPATH=\ therefore .gitignore is read from a network drive (we will somehow workaround this, unfortunatelly EGit has no config option for this).

2) .gitconfig is accessed a enormous amount of times.

To identify issue 2) I instrumented the source of org.eclipse.jgit.util.FileUtils to log the call stack whenever .gitconfig is accessed. For testing I used a Eclipse 2019-03 workspace with a large Git repo and several hundreds projects in the workspace (typical enterprise setup). Here are my results:

- Eclipse startup: 87
- Open context menu in repository view: 236

The file is mostly accessed from main thread which explains long UI freezes.

Please find the aggregated call stacks attached. Reducing the number of calls would also improve the performance for users with local configuration.
Comment 20 Marc R. Hoffmann CLA 2019-09-26 12:43:09 EDT
Created attachment 280052 [details]
tracing-startup.txt
Comment 21 Marc R. Hoffmann CLA 2019-09-26 12:43:55 EDT
Created attachment 280053 [details]
tracing-contextmenu.txt
Comment 22 Michael Keppler CLA 2019-09-26 12:52:46 EDT
Marc, if you used a 2019-03 setup, please upgrade at least egit itself to the current 5.5.0, there have been a lot of changes to reduce the number of accesses to the git config.

That being said, there will be nothing to make this really quick besides not using a network share for the user config (which is why egit even warns you when it detects that situation).
Comment 23 Marc R. Hoffmann CLA 2019-09-26 13:23:54 EDT
(In reply to Michael Keppler from comment #22)
> Marc, if you used a 2019-03 setup, please upgrade at least egit itself to
> the current 5.5.0, there have been a lot of changes to reduce the number of
> accesses to the git config.

Is it part of a recent release train? 

> That being said, there will be nothing to make this really quick besides not
> using a network share for the user config (which is why egit even warns you
> when it detects that situation).

Agreed, therefore it would be great if Eclipse had an option to change that location :)   We use a custom Eclipse distribution with several presets as plugin customizations so it would be very easy for us to change the default. Poking around with environment variables in a managed Windows envoronment is another story... (the Windows environment is beyond our control).
Comment 24 Thomas Wolf CLA 2019-09-26 15:21:01 EDT
(In reply to Michael Keppler from comment #22)
> Marc, if you used a 2019-03 setup, please upgrade at least egit itself to
> the current 5.5.0, there have been a lot of changes to reduce the number of
> accesses to the git config.

Eh, try EGit nightly. The caching from bug 544600 is not in 5.5.0. It improves the context menu case (and other action enablement evaluations).

EGit nightly can be installed from the Egit nightly update site at https://download.eclipse.org/egit/updates-nightly .

How to do similar caching for the RepositoryTreeNodeLabelProvider and for decorators is still unclear, see bug 550878. It would also help if we could get the image decorations out of the synchronous RepositoryTreeNodeLabelProvider into the background decorator without getting blinking of decorations when the repository view is refreshed.
Comment 25 Matthias Sohn CLA 2019-09-26 15:30:21 EDT
5.5.0 introduced caching of ~/.gitconfig and system-level config, caching of most frequently used git meta data which heavily reduces number of accesses to these configurations was introduced on master as Thomas pointed out already.
Comment 26 Marc R. Hoffmann CLA 2019-09-26 15:34:05 EDT
Many thanks for your instant feedback! Just to be clear about my analysis: The issue is not with actually reading these files. The problem is the frequent access to the file attributes.
Comment 27 Thomas Wolf CLA 2019-09-26 15:36:49 EDT
(In reply to Marc R. Hoffmann from comment #26)
> Many thanks for your instant feedback! Just to be clear about my analysis:
> The issue is not with actually reading these files. The problem is the
> frequent access to the file attributes.

Yes, that's right, and that's what bug 544600 fixed for most cases where these accesses occur on the main thread. Bug 550878 is about many remaining cases that occur in decorators (off the main thread and thus not that critical, but nevertheless worth optimizing).
Comment 28 Marc R. Hoffmann CLA 2019-09-26 15:41:17 EDT
(In reply to Thomas Wolf from comment #27)

Beside moving file access out of the main thread I wonder whether it would make sense for example to give FileBasedConfig.isOutdated() some buffer time (let's say of 1 second) to avoid very frequent re-checks.
Comment 29 Marc R. Hoffmann CLA 2019-09-27 08:50:58 EDT
Created attachment 280067 [details]
20190927-tracing-startup.txt
Comment 30 Marc R. Hoffmann CLA 2019-09-27 08:51:21 EDT
Created attachment 280068 [details]
20190927-tracing-contextmenu.txt
Comment 31 Marc R. Hoffmann CLA 2019-09-27 09:01:15 EDT
I just re-tested the exact same workspace with the latest master of JGit and EGit with the exact same workspace as before. Here is how many times .gitconfig is accessed: 

- Eclipse startup: 67 (was 87), 7 from main thread
- Open context menu in repository view: 36 (was 236), 19 from main thread


It is definitely going in the right direction, thanks for all your effort!

Still most of them have a single source:

   +- #36 org.eclipse.jgit.util.FileUtils.fileAttributes()
      +- #36 org.eclipse.jgit.util.FS.fileAttributes()
         +- #36 org.eclipse.jgit.internal.storage.file.FileSnapshot.isModified()
            +- #36 org.eclipse.jgit.storage.file.FileBasedConfig.isOutdated()
               +- #36 org.eclipse.jgit.util.SystemReader.getUserConfig()


Is it really required to check on every config acces? What if the code would record the last timestamp it was checked and re-checks only when the last check was more than n seconds ago?
Comment 32 Marc R. Hoffmann CLA 2019-09-27 09:06:10 EDT
Created attachment 280069 [details]
20190927-tracing-openjavaperspective.txt
Comment 33 Marc R. Hoffmann CLA 2019-09-27 09:09:05 EDT
Ouch, I did another test case with the latest version: Opening the Java Perspective with about 500 Bundles: 4793 (!) calls to FileUtils.fileAttributes() for .gitkonfig. See call traces attached.
Comment 34 Thomas Wolf CLA 2019-09-30 15:59:50 EDT
(In reply to Marc R. Hoffmann from comment #33)
> Ouch, I did another test case with the latest version: Opening the Java
> Perspective with about 500 Bundles: 4793 (!) calls to
> FileUtils.fileAttributes() for .gitkonfig. See call traces attached.

Yes. This is known. There are many calls on background threads from decorators, and that's exactly the subject of bug 550878. Plus there's the image decorations in the git repositories view, which occur on the UI thread. Those can't be just moved to the background decorator; that was tried but it leads to highly annoying flickering when items are refreshed.
Comment 35 Eclipse Genie CLA 2019-11-06 03:20:50 EST
New Gerrit change created: https://git.eclipse.org/r/152110
Comment 36 Eclipse Genie CLA 2019-11-12 10:13:05 EST
Gerrit change https://git.eclipse.org/r/152110 was merged to [master].
Commit: http://git.eclipse.org/c/egit/egit.git/commit/?id=6f1a3c35baab4ca0c5c595ef336d3a12801e804b
Comment 37 Thomas Wolf CLA 2019-11-17 17:45:54 EST
Number of accesses to ~/.gitconfig is massively reduced in EGit 5.6.