Community
Participate
Working Groups
We repeatedly get the following exception in Jenkins from JGit when many commits are compared. This results from lack of support for nanosecond resolution on the NonStop platform, and inability to perform operations with the UNDEFINED value in a Java long. Specifically, when a valid file stamp is converted to nanoseconds, then compared with another timestamp and one has an UNDEFINED value, the Java long range is exceeded. This seems like a pervasive issue in JGit. The multiplyExact, for example in the event/trace method appears to not deal with this condition. The situation is file system implementation specific and is not related to the JVM version. Jenkins version: all Jenkins Git Client version: 4.3.0 JGit Version: unknown but packaged with the Git Client. JVM: 1.8.0_45 - 1.8.0.251 Platform HPE NonStop TNS/X (L19.20), and HPE NonStop TNS/E (J06.22) Likelihood: Always 2020-07-30 12:06:27.004+0000 [id=99] SEVERE o.e.j.u.FS$FileStoreAttributes#lambda$3: java.lang.ArithmeticException: long overflow java.lang.ArithmeticException: long overflow at java.lang.Math.multiplyExact(Math.java:1052) at java.time.Duration.toNanos(Duration.java:1186) at org.eclipse.jgit.util.FS$FileStoreAttributes.lambda$1(FS.java:363) at org.eclipse.jgit.util.FS$FileStoreAttributes$$Lambda$245/120256761.get(Unknown Source) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1582) Caused: java.util.concurrent.CompletionException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1584) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
I don't quite understand the description. JGit runs well on FAT (2 seconds file timestamp resolution), and also on Mac with Java 8 (1 second). OK, you _did_ write it was file system specific. From code inspection I don't see where we would ever compare FileStoreAttributes.UNDEFINED_DURATION < or > something. The only other reason for getting that exception shown in your stack trace would be measureFsTimestampResolution returning a crazy value. Duration.toNanos() should not overflow unless the duration is > 9223372036 seconds. So what UNDEFINED value do you mean? Is this some bogus value that this nonstop platform's filesystem returns if one has called Files.setLastModifiedTime() with some nanosecond value?
The problem comes down to: private static final Duration UNDEFINED_DURATION = Duration.ofNanos(Long.MAX_VALUE); When this value is set to any Duration, and even 1 nano is added to the duration, this will cause an arithmetic overflow. What appears is that nano-second resolution on the platform is not supported, so Duration ends up with UNDEFINED_DURATION, so when multiplyExact is called, UNDEFINED_DIRECTION * anything non-zero will overflow. That seems to be the case here.
(In reply to Randall Becker from comment #2) > The problem comes down to: > > private static final Duration UNDEFINED_DURATION = > Duration.ofNanos(Long.MAX_VALUE); > > When this value is set to any Duration, and even 1 nano is added to the > duration, this will cause an arithmetic overflow. What appears is that > nano-second resolution on the platform is not supported, so Duration ends up > with UNDEFINED_DURATION, so when multiplyExact is called, > UNDEFINED_DIRECTION * anything non-zero will overflow. That seems to be the > case here. UNDEFINED_DURATION is used only as a default marker when reading the .jgitconfig entry. I don't see anything added to it anywhere or toNanos() called on it; only two equals comparisons. The stack trace shows an exception after measureFsTimestampResolution(), so the measurement must have produced a ridiculously high (or negative?) value, but I don't see from code inspection how this could happen unless the file system returned bogus file modification times. Can you try the main loop of that measurement on your system? Files.createFile(probe); FileTime t1 = Files.getLastModifiedTime(probe); FileTime t2 = t1; Instant t1i = t1.toInstant(); for (long i = 1; t2.compareTo(t1) <= 0; i += 1 + i / 20) { Files.setLastModifiedTime(probe, FileTime.from(t1i.plusNanos(i * 1000))); t2 = Files.getLastModifiedTime(probe); // print this } Duration fsResolution = Duration.between(t1.toInstant(), t2.toInstant()); Duration clockResolution = measureClockResolution(); fsResolution = fsResolution.plus(clockResolution); Somewhere in this code one of t1, t2, fsResolution or clockResolution must produce a very large value.
I am working on getting testing time for this code snippet and will report back shortly.
This might explain it: t1=2020-08-11T21:09:12Z t1i=2020-08-11T21:09:12Z t2.compareTo(t1)=0 (at loop start) t1i.plusNanos(i * 1000)=2020-08-11T21:09:12.000001Z(1000 nanos) t2=1952-08-24T02:04:48Z t2i=1952-08-24T02:04:48Z fsResolution=PT-595795H-4M-24S(-2144862264000000000 nanos) So the Files.setLastModifiedTime(Instant) appears to be malfunctioning.
I have verified that setLastModified(long) works. The underlying file system resolution is in microseconds, although setLastModified(long) is in milliseconds.
I have run the same test on z/OS USS R2.4. The resulting fsResolution is 0 although the setLastModifiedTime call works but truncates the nanosecond resolution off. The issue is in all JVMs I can find on both platforms.
(In reply to Randall Becker from comment #5) > This might explain it: > > t1=2020-08-11T21:09:12Z > t1i=2020-08-11T21:09:12Z > t2.compareTo(t1)=0 (at loop start) > t1i.plusNanos(i * 1000)=2020-08-11T21:09:12.000001Z(1000 nanos) > t2=1952-08-24T02:04:48Z > t2i=1952-08-24T02:04:48Z > fsResolution=PT-595795H-4M-24S(-2144862264000000000 nanos) > > So the Files.setLastModifiedTime(Instant) appears to be malfunctioning. report this bug for the corresponding JVM port ? Did you also try Java 11 or higher ?
(In reply to Randall Becker from comment #5) > This might explain it: > > t1=2020-08-11T21:09:12Z > t1i=2020-08-11T21:09:12Z > t2.compareTo(t1)=0 (at loop start) > t1i.plusNanos(i * 1000)=2020-08-11T21:09:12.000001Z(1000 nanos) > t2=1952-08-24T02:04:48Z > t2i=1952-08-24T02:04:48Z > fsResolution=PT-595795H-4M-24S(-2144862264000000000 nanos) > > So the Files.setLastModifiedTime(Instant) appears to be malfunctioning. That is... interesting. A FileTime before Instant.EPOCH, and at the same time t2.compareTo(t1) <= 0 is false. (Otherwise that loop would not terminate!) AWhy would fsResolution come out as zero on z/OS? That should only be possible if we lose resolution when converting from FileTime to Instant. Now the _range_ of FileTime is bigger than that of Instant, but the conversion should not lose precision.
It is correct that the loop does not terminate. I had to put a break in it. On z/OS, the timestamp is truncated to milliseconds, so setting the time does not actually change the time and so fsResolution calculates as zero. I do not know whether the actual usable resolution is better than 1s. On NonStop, the timestamp is truncated to microseconds, but the JVM defect is that setLastModifiedTime(Instant) is setting the value incorrectly. I have opened a case with HPE. This will probably not get fixed on their TNS/E platform, which is bad, and it could take months to get a fix for TNS/X. Using setLastModifiedTime(long) would work as the Instant value is correct - although it would truncate. I do not think there is a working exposed interface in Java to set timestamps to microseconds, so the usable resolution really is 1milli. Note: I am not an employee/contractor for either company, so very little of a fix/change for this is within my control.
Also, there is no JVM higher than 1.8 on NonStop TNS/E or on my z/OS system. There is a candidate JVM 11 on NonStop TNS/X, but I do not have access to that as of yet. Still, Jenkins has a minimum support of Java 8, so the code needs to work there.
(In reply to Randall Becker from comment #10) > It is correct that the loop does not terminate. I had to put a break in it. Only on NonStop, or also on z/OS? > On z/OS, the timestamp is truncated to milliseconds, so setting the time > does not actually change the time and so fsResolution calculates as zero. I > do not know whether the actual usable resolution is better than 1s. Even if the resolution is only milliseconds: we keep adding more and more, so after enough iterations, we should be adding milliseconds, and we should see a difference, not zero. We exit the loop when t2.compareTo(t1) > 0, and next we do a Duration.between(t1.toInstant(), t2.toInstant()). So how can we exit that loop and still get zero? Millis or not. Does the Duration.between() call drop them? It does have some exception handling where nanos might get set to zero. FileTime.toInstant() shouldn't drop precision.
(In reply to Thomas Wolf from comment #12) > (In reply to Randall Becker from comment #10) > > It is correct that the loop does not terminate. I had to put a break in it. > > Only on NonStop, or also on z/OS? Both platforms have this. t2 maintains less than or equal to t1 and the loop does not terminate for a very long time - never on NonStop, and 1 second (10^9 nanos) on z/OS. Obviously, I did not let it run that long. > > On z/OS, the timestamp is truncated to milliseconds, so setting the time > > does not actually change the time and so fsResolution calculates as zero. I > > do not know whether the actual usable resolution is better than 1s. > > Even if the resolution is only milliseconds: we keep adding more and more, > so after enough iterations, we should be adding milliseconds, and we should > see a difference, not zero. We are still seeing 0 on z/OS so I think the actual resolution set through setLastModifiedTime is 1 second, not 1 millisecond. > We exit the loop when t2.compareTo(t1) > 0, and next we do a > Duration.between(t1.toInstant(), t2.toInstant()). So how can we exit that > loop and still get zero? Millis or not. Does the Duration.between() call > drop them? It does have some exception handling where nanos might get set to > zero. FileTime.toInstant() shouldn't drop precision. We never exit on NonStop because t2 ends up before the EPOCH. We will eventually exit on z/OS once we count all nanos in a second. FileTime.toInstant() is not dropping resolution, but setLastModifiedTime(Instant) is. So when t2 is picked up from the file system it is either really bad (NonStop) or the same as t1 for a long time (z/OS). Note: I have officially opened a case with HPE Support on this issue. I do not think IBM will consider this a defect.
(In reply to Randall Becker from comment #13) > (In reply to Thomas Wolf from comment #12) > > (In reply to Randall Becker from comment #10) > > > It is correct that the loop does not terminate. I had to put a break in it. > > > > Only on NonStop, or also on z/OS? > > Both platforms have this. t2 maintains less than or equal to t1 and the loop > does not terminate for a very long time - never on NonStop, and 1 second > (10^9 nanos) on z/OS. Obviously, I did not let it run that long. All right, so on z/OS it does eventually exit the loop, and does give a resolution >= 1sec after the loop? When I saw that code I also thought that this would loop quite long if the resolution was large. :-) We can do something about that. I don't know whether whatever we might do for that would also side-step the NonStop issue. Since we don't have a NonStop platform to test test on (at least I don't) we can't make any promises about that. (I'm thinking along the line of doing a kind of binary search between 2s and 0 to find the resolution, using setLastModified(long) for the range > millis and setLastModifiedTime(Instant) for smaller attempts.)
(In reply to Thomas Wolf from comment #14) > (In reply to Randall Becker from comment #13) > > (In reply to Thomas Wolf from comment #12) > > > (In reply to Randall Becker from comment #10) > > > > It is correct that the loop does not terminate. I had to put a break in it. > > > > > > Only on NonStop, or also on z/OS? > > > > Both platforms have this. t2 maintains less than or equal to t1 and the loop > > does not terminate for a very long time - never on NonStop, and 1 second > > (10^9 nanos) on z/OS. Obviously, I did not let it run that long. > > All right, so on z/OS it does eventually exit the loop, and does give a > resolution >= 1sec after the loop? > > When I saw that code I also thought that this would loop quite long if the > resolution was large. :-) We can do something about that. > > I don't know whether whatever we might do for that would also side-step the > NonStop issue. Since we don't have a NonStop platform to test test on (at > least I don't) we can't make any promises about that. > > (I'm thinking along the line of doing a kind of binary search between 2s and > 0 to find the resolution, using setLastModified(long) for the range > millis > and setLastModifiedTime(Instant) for smaller attempts.) I'm going to assume z/OS will stop eventually. Counting that long is actually expensive - in real dollars (we pay for usage). I can certainly test the proposed code.
New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/167706
(In reply to Randall Becker from comment #15) > I'm going to assume z/OS will stop eventually. Counting that long is > actually expensive - in real dollars (we pay for usage). With Java 8 on Mac APFS (1 second resolution) current code takes 233 iterations and about 10ms. > I can certainly test the proposed code. (In reply to Eclipse Genie from comment #16) > New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/167706 Please test the code from that Gerrit change on your z/OS and NonStop platforms.
(In reply to Thomas Wolf from comment #17) > (In reply to Randall Becker from comment #15) > > I'm going to assume z/OS will stop eventually. Counting that long is > > actually expensive - in real dollars (we pay for usage). > > With Java 8 on Mac APFS (1 second resolution) current code takes 233 > iterations and about 10ms. > > > I can certainly test the proposed code. > > (In reply to Eclipse Genie from comment #16) > > New Gerrit change created: https://git.eclipse.org/r/c/jgit/jgit/+/167706 > > Please test the code from that Gerrit change on your z/OS and NonStop > platforms. Tested the method in isolation. The result is printed as: fsResolution=PT1S(1000000000 nanos) There were no messages logged, which is a bit suspicious. I'm not sure the code is working as intended. On HP NonStop, the result should be at millisecond resolution at worst. I added some debugging messages to my copy with the following trace: Initial Loop Trying 1000 t2=1960-02-29T02:54:48Z Trying 1000000 t2=2020-08-14T15:02:03Z Before binary search. high=2,low=0,minScale=1000000 Loop iteration with high=2,low=0,mid=1 delta=1000000000, current scale=1000000000 t2=2020-08-14T15:02:04Z cmp=1 diff=PT1S Loop iteration with high=1,low=0,mid=0 delta=500000000, current scale=100000000 t2=2020-08-14T15:02:03Z cmp=0 no discernible difference Loop iteration with high=10,low=6,mid=8 delta=800000000, current scale=100000000 t2=2020-08-14T15:02:03Z cmp=0 no discernible difference Loop iteration with high=10,low=9,mid=9 delta=900000000, current scale=100000000 t2=2020-08-14T15:02:03Z cmp=0 no discernible difference fsResolution=PT1S(1000000000 nanos)
(In reply to Randall Becker from comment #18) > Tested the method in isolation. The result is printed as: > > fsResolution=PT1S(1000000000 nanos) > > There were no messages logged, which is a bit suspicious. No, that's correct. The initial test for 1000 (1 microsecond) doesn't log, and later the loop goes from larger to smaller intervals, but never even gets near the millisecond range. > I'm not sure the > code is working as intended. On HP NonStop, the result should be at > millisecond resolution at worst. May also depend on the Java version used. On my Mac, I get 1s with Java 8, and 1 microsecond with Java 11. On the same file system. (Which BTW, is normal and correct.) > I added some debugging messages to my copy > with the following trace: > > Initial Loop > Trying 1000 > t2=1960-02-29T02:54:48Z > Trying 1000000 > t2=2020-08-14T15:02:03Z > Before binary search. high=2,low=0,minScale=1000000 > Loop iteration with high=2,low=0,mid=1 > delta=1000000000, current scale=1000000000 > t2=2020-08-14T15:02:04Z > cmp=1 > diff=PT1S > Loop iteration with high=1,low=0,mid=0 > delta=500000000, current scale=100000000 > t2=2020-08-14T15:02:03Z > cmp=0 > no discernible difference > Loop iteration with high=10,low=6,mid=8 > delta=800000000, current scale=100000000 > t2=2020-08-14T15:02:03Z > cmp=0 > no discernible difference > Loop iteration with high=10,low=9,mid=9 > delta=900000000, current scale=100000000 > t2=2020-08-14T15:02:03Z > cmp=0 > no discernible difference > > fsResolution=PT1S(1000000000 nanos) Thanks for testing, but this shows quite clearly that adding 1 second gave a difference, but adding 0.5s did _not_ give any difference. And neither did 0.8 or 0.9. So 1 second is correct.
Note that the code is supposed to report file timestamp resolution *as seen by Java*. This may be worse that what a corresponding C program would get.
(In reply to Thomas Wolf from comment #20) > Note that the code is supposed to report file timestamp resolution *as seen > by Java*. This may be worse that what a corresponding C program would get. Too right. I suspect that "my way" of getting the timestamp in C is not how the JVM does it, but then "my way" involves OS primitives. Even stat() only reports 1 second resolution.
There are a few related JDK bugs for this. The upshot is: it depends on the file system mounted (for instance Mac HFS/HFS+ only stores 1sec resolution), on the OS, and on top of that it depends on the Java version. Java 8 on Mac APFS with OS X 10.14.6 (1ns resolution natively) still only gives you 1s resolution. Java 11 and 13 would give microsecond resolution, and starting from Java 14 one should be able to get nanoseconds on Linux/OS X. (But currently the code stops at 1 microsecond.) On NTFS, the native resolution is 100ns, but you can get that also only from Java 14 on.
Gerrit change https://git.eclipse.org/r/c/jgit/jgit/+/167706 was merged to [master]. Commit: http://git.eclipse.org/c/jgit/jgit.git/commit/?id=2990ad66ade8289f1d91a00b65a2406fabd1dea2