"Surface not cachable" 3.3.2

viewtopic.php?f=8&t=12149&p=62615#p62615

User on “Module Support” forum reporting repeated load failures in 3.3.2 on Empire of the Sun module. I linked one of his error logs in the thread.

bugs.openjdk.java.net/browse/JDK-8072618

The bug in openjdk seems to have been fixed a couple years ago.

The line numbers in the exception stacktraces are also different. Might be that this bug reappeared, might be something else.

No, the bug was never fixed on openJDK, it could not be reproduced in later versions and the bug report was closed. Subtle difference. No one tried to reproduce the bug on the reported version or investigated why it occurred. They just assumed that someone had fixed it because they could not reproduce it in a later version.

The MapShader code has changed since 3.3.2. The last vassal line in the stack trace is now line 178

g2d.fill(area);

and yes, we are trying to paint with custom TexturePaint. The stack trace is pretty much identical, this is almost certainly the same bug that was never fixed, just masked.

True.

The line numbers in the Java classes are a little different through, so something did change there over the years.

The module in question seems to be actively maintained and is probably in use by many, I wonder if this bug happens to all users or just some. Also, the bug report is from the module version 5 revision 5, but I only see revision 7 in the wiki.

Anyways, I’m on Linux as well, and for this module in revision 7:

  • OpenJDK 11 (GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02)), can reproduce this bug
  • OpenJDK 14 (Zulu14.28+21-CA (build 14.0.1+8)), can reproduce this bug

Seems to be quite the showstopper for Linux users. Wonder if it works on windows & macs.

It will likely be a problem with any module that uses Map Shaders with custom shade patterns. I’ll test it shortly on win10

bugs.openjdk.java.net/browse/JDK-8201631

“This issue seems restricted to Linux (Ubuntu) only and work fine when checked in MAC OS X 10.13.1 and Windows 10.”

Also, “component: java.awt”, “priority: P3”, and there’s not a single java.awt bug in P1 and P2.

They won’t fix this anymore, awt is ancient technology and they have better things to do. We need to switch to JavaFX.

Or do it like JetBrains does and have the application run on JBR “JetBrains Runtime”, a custom OpenJDK where our engineers fix JDK bugs and add neat features like subpixel hinting.

At least it is an open bug report, and only a year old, but is not actively being worked on by the looks.

I could not reproduce it on Win10.

The Zone Of Influence shaders all use custom images to generate the Shade paint. This will affect any module that does the same.

Changing the JDK we use on the affected OS’s is probably the best short term fix. Or do we not bundle on linux?

Nope we don’t bundle on linux, linux has very convenient ways for the user to install JDKs. Remember when windows+mac were the user friendly systems and linux was for the hardcore types only? How times change…

And I wouldn’t want them to spend time on such bugs. Nobody should be using AWT/Swing anymore. They only keep the code in the JDK out of kindness, they are way too kind.

Look at the package name, “sun.”… remember Sun? :smiley:

Probably the best “fix” is to detect whether we are on linux and disable the whole map-shade-thing altogether, whatever that is.

No, that’s not a fix. That’s a way to piss users off and destroy our brand. Oops, there goes another 1.5% Mr. Spock. Not to worry, we still have 97% left. With any luck there are less than 65 bugs left :slight_smile:

JDK’s with bugs in them that never get fixed, who’d have thunk it?

Seriously, we need more information, the JavaFX re-write probably won’t make it into 3.3.3. Is there an alternative JDK available for Ubuntu that we can get the user to try? Does this affect the ‘other’ flavour of linux? Do the in-built shade patterns fail as well, or is it just custom textures loaded from an image? Is anybody willing to dig into the openJDK source to see if we can identify a work-around?

On another note, we SERIOUSLY have to think about some sort of Stable/Unstable release pattern. The test builds just don’t get a wide enough trial to identify problems.

I’m on a Linux system (Kubuntu 20.04), so I figured I’d play with this a little. As expected, with my default Java (OpenJDK 11.0.7), EOTS 5.0rev7 crashes; however, I don’t get a useful error message, just this image:[attachment=0]VASSAL 3.3.1 EOTS5.0rev7 error.png[/attachment]. If I click the OK button twice, the crash window closes, and then the main module window will gradually paint its buttons as I cursor over them, but obviously the game is unplayable.

The errorlog shows the same error as above.

I tried downloading Oracle Java version 14.0.2 and setting it as my default, but still got the same crash (with the same blank crash window). So, I can confirm that this error isn’t only in OpenJDK. I’ll try a few other options later.

This is the first time I’ve managed to get VASSAL to crash, so I’m not sure if the crash window never works for me (I left it sitting for quite a few minutes, to see if it was just being slow to populate, but it never changed)…

Edit: Realized I should probably be using the latest release to test this, so I downloaded 3.3.2 and tried again. The error window populates correctly, so that particular bug has already been fixed, apparently. EOTS5.0rev7 still crashes, though.

OK, tried a few more Java VMs, and none of them worked:

Zulu 14.0.2 and Java.net 16.ea.6 both crashed the same way. Bellsoft 14.0.2 wouldn’t even open the Module Manager, immediately crashing with Exception: java.lang.NoClassDefFoundError thrown from the UncaughtExceptionHandler in thread "main" !

So, I don’t think an alternate Java VM will help at all.

Thus spake Brent Easton:

Changing the JDK we use on the affected OS’s is probably the best short
term fix. Or do we not bundle on linux?

Bundling on Linux is not the right way there, as you run into trouble
with versioning of shared libraries. That’s not normally a problem
because your distro maintainers ensure that everything is compiled
against compatible libraries, but we can’t do that when we’re
distributing one JVM ourselves.


J.

Thus spake Brent Easton:

No, that’s not a fix. That’s a way to piss users off and destroy our
brand. Oops, there goes another 1.5% Mr. Spock. Not to worry, we still
have 97% left. With any luck there are less than 65 bugs left :slight_smile:

Concur.

On another note, we SERIOUSLY have to think about some sort of
Stable/Unstable release pattern. The test builds just don’t get a wide
enough trial to identify problems.

That’s nothing to do with this bug. There have been instances of this in
the bug tracker for many years, but no one has been able to identify how to
reproduce it until now.

I anticipate one of two problems with unstable releases:

  1. We would not get much more uptake of unstable releases than we do of betas.
    All but one of the bugs fixed in 3.3.1 and 3.3.2 are things which could have
    been caught in one of the betas. I announced the betas the in all the places
    I normally do, and they were out collectively from the end of April to the
    middle of June.

I see a lot of resistance to trying betas when I suggest people try them
to check if a problem is solved. If unstable releases are clearly marked as
such, and people understand what an unstable release is, then I expect the
same resistance.

  1. On the other hand, maybe people won’t understand what unstable releases
    are. In that case, I expect we’ll have a lot of angry users complaining
    about broken releases, despite that we said they were unstable…

I’m not dead set against it, but I’m not convinced at present.


J.

Fair enough.

I’ve since tried the openJDK 11 & 14, the Oracle JDK 11 & 14 and the AdoptJDK 14 with the Eclipse JVM. All with same bug, all obviously based on the same source.

However, I investigating a potential workaround.

The workaround is to pass the -Dsun.java2s.xrender=false to the JVM when starting up the Player or Editor to disable the Xrender graphics pipeline on linux machines. This option should have no effect on other OS’s. Apart from fixing the bug, the module seems to be running fine.

This is good. Fixes the bug for me.

Parts of JDKs that are ancient and should not be used by anyone in 2020. No one wants the JDK people to fix these bugs, it would be better for everyone if they kicked out this AWT/Swing part altogether. This would relieve them of having to support this piece of crap UI toolkit, and force users of this toolkit to use a different, modern and better UI toolkit. But, sadly, the JDK people are no friends of Mr. Spock, they do care about minorities, and keeping backward compatibility, so the terrible decisions that have been made, like Swing, AWT, the Cloneable interface with it’s friend the clone() method, or Object.finalize() will always stay in the JDK. At least no one is going to waste time fixing these things, and sane developers know to avoid them.

And about release management, we don’t have enough test coverage, and no army of testers that runs through hundreds of test cases for each module after every release. Beta releases don’t get us enough test coverage, module designers themselves are apparently not reliable, the designer of this particular module apparently didn’t even bother running it once on linux. Not even a simple smoke test. We could make every release a full release i.e. no more betas, just increase the patch version. This would likely increase test coverage but would also piss off more users.

Thus spake Flint1b:

And about release management, we don’t have enough test coverage, and no
army of testers that runs through hundreds of test cases for each module
after every release. Beta releases don’t get us enough test coverage,
module designers themselves are apparently not reliable, the designer of
this particular module apparently didn’t even bother running it once on
linux. Not even a simple smoke test. We could make every release a full
release i.e. no more betas, just increase the patch version. This would
likely increase test coverage but would also piss off more users.

I’m not sure what options we have for increasing test coverage.

  • More unit tests would help, but would only be partially effective,
    as a good chunk of bugs reported are UI bugs which are hard to unit test.

  • There isn’t more developer time to sink into testing. It also wouldn’t
    be an approriate use of developer time even if there were.

  • I plug our betas everywhere I know of. Maybe there are places I’m missing
    other than this fourm, our front-page news feed, GitHub, ConsimWorld, the
    VASL forum, BoardGameGeek, and the two VASSAL Facebook Groups. If there is
    somewhere I’m missing that would make much difference, I’m open to
    suggestions.

  • Convincing users to test is hard. Nearly every time I post somewhere
    about betas, I get someone replying “Nah, I’ll just wait for the release”.
    Thanks for helping, guy!

  • Most module designers won’t have access to more than one OS.

  • The thing which would help most would be having a dedicated group of
    non-developers who try pre-releases regularly, rather than relying on
    random users to wander in and try a thing or two. How do we get there
    from here?


J.

I can trigger the problem reliably on my Linux development system, and the flag suggested by Brent appears to be an effective workaround. I’ve used an older version of the EoTS module specifically for testing map shaders before, didn’t trigger this problem, and wasn’t using that flag. It makes me wonder what’s changed.