Bug 565048 - Files dirty directly after clone with text=auto in .gitattributes
Summary: Files dirty directly after clone with text=auto in .gitattributes
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 5.8   Edit
Hardware: All Linux
: P3 normal (vote)
Target Milestone: 5.9   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-08 06:18 EDT by Rene Pfeuffer CLA
Modified: 2020-08-19 08:04 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rene Pfeuffer CLA 2020-07-08 06:18:13 EDT
We have a repository with files that had been checked in with Windows line breaks (CRLF). After these files had been checked in, a .gitattributes file had been created with the following content:

* text=auto

On a Linux system, from this commit on, the files with the Windows line breaks are checked out with Unix line breaks (LF), but are directly afterwards considered modified by git (which in fact they are). Even a reset does not reset them (they keep their LF line breaks). This leads to the situation, that merges cannot be executed, because git refuses merges with a dirty workspace.

Native Git handles this differently: The files are checked out as they are stored in the repository without modifying the line breaks (that is, they keep CRLF). Therefore the files are not treated as being modified.

JGit should adapt this behaviour of the native Git client and not modify the line breaks when they had been checked in with CRLF, even on Unix systems.

You can reproduce this behaviour by cloning the following repository on a Unix system with JGit: https://github.com/pfeuffer/crlf-demo
Comment 1 Thomas Wolf CLA 2020-07-08 09:08:17 EDT
Can reproduce. I wonder, though, if something has changed in git.

Assuming there are no other crlf-related settings, I _think_ the default of core.eol kicks in, which is "native" according to [1]. Which would give the jgit behavior.

There must be something among all these git crlf-related settings that overrides this... but probably undocumented. It does look like C git indeed just checks out files stored in the index with CRLF as is.

Frankly said, eol handling in git is broken. Way too many options with unclear interactions, legacy options that one isn't supposed to use anymore but which are still supported, and they changed the behavior over the years.

I can even set the .gitattributes file to "* text=auto eol=lf" and _still_ C git checks the file out with CRLF.

We have understood "When the file has been committed with CRLF, no conversion is done." at [2] to mean "no conversion upon check-in is done". Apparently it means "no conversion on check-in or check-out is done".

Changing this may not be easy; affects not only check-out but also merge (and I don't even want to think about merging two branches, one with the file with CRLF, the other with LF, and differing .gitattributes related to eol), and tests will have to be re-analyzed and changed.
 
[1] https://git-scm.com/docs/git-config#Documentation/git-config.txt-coreeol
[2] https://git-scm.com/docs/gitattributes#_checking_out_and_checking_in
Comment 2 Eclipse Genie CLA 2020-07-23 16:14:10 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/166755
Comment 3 Thomas Wolf CLA 2020-07-23 16:15:34 EDT
(In reply to Eclipse Genie from comment #2)
> New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/166755

EGit will also need an update to make use of the new versions of TreeWalk.getEolStreamType() introduced in this commit.
Comment 4 Eclipse Genie CLA 2020-07-24 08:27:36 EDT
New Gerrit change created: https://git.eclipse.org/r/c/egit/egit/+/166800
Comment 5 Thomas Wolf CLA 2020-07-26 15:11:40 EDT
(In reply to Thomas Wolf from comment #3)
> (In reply to Eclipse Genie from comment #2)
> > New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/166755
> 
> EGit will also need an update to make use of the new versions of
> TreeWalk.getEolStreamType() introduced in this commit.

Scratch that; the JGit change has been rewritten completely. EGit still needs a little adaptation in one place.