precomputing map tiles

uckelman · August 30, 2009, 10:45pm

I’ve been thinking about what we can do about the remaining problems that
users have reported when trying to load large maps. When loading an image,
there are three things which take time: allocating the original BufferedImage,
decoding the pixel data, and converting the original BufferedImage into a
usable type.

I’ve done some tests to see if we can avoid the conversion step by telling
ImageIO to decode the pixel data directly into the type of BufferedImage we
want to use. That works, but is slower (!) than creating the first
BufferedImage (which will often be TYPE_CUSTOM) and doing the conversion.
I don’t see any way to shave time on this.

Image decoding is something where we can potentially save a lot of time. The
amount of time it takes to decode image data and create a BufferedImage from
it is about 10x more than it takes to create a BufferedImage from raw pixel
data written to disk. This means that if we read images in once using ImageIO
and then wrote them back to a disk cache as (compressed) raw pixel data, we
could load images about 10x faster on all loads after the first one. Since a
disk cache can be persistent, this would mean we could have such savings on
all runs of the module after the first. The tradeoff here is a bit of disk
space for better performance. (Two examples: the raw pixel data for non-piece
images in the new 1805 module is 39MB; for Case Blue, it’s 110MB.) I’m
guessing that disk space is not in short supply for anyone these days; anyway,
it should be much more plentiful than RAM or CPU cycles.

What I have kicking around in my head right now is that we could do the
following:

The first time a module is loaded, load each large image and write the raw
pixel data for each tile out to the image cache. This is something for which
we could show a progress dialog (and I think we could even be pretty accurate
about the progress). I did some timing, and found that for the 1805 module, it
took about 150s to tile all of the map and chart images and write them out to
disk; for Case Blue, it took 452s. This is a one-time cost, and since it’s
something which I think will have a huge impact on load time, I think most
users will find it worthwhile. Having all of the unscaled tiles available
from the disk cache would make it unnecessary to ever load the original image
again—so for modules with large maps, that’s a huge savings, both in load
time and in continued RAM usage. (And it would make the in-memory image cache
more flexible, since it wouldn’t contain anything larger than the tile size.)

Before I put time into the design work for this, I’d like to hear whether
anyone sees problems with this, has suggestions, etc.

mkiefte · August 30, 2009, 11:13pm

I don’t want to belabour this point as it’s clear you’ve already done
a lot of work on this, but why are we seeing these problems now? In
addition, is there something about the Java standard libraries that is
particularly inefficient? Are there alternative image handling
libraries?

mk

On Sunday, August 30, 2009, uckelman messages@forums.vassalengine.org wrote:

I’ve been thinking about what we can do about the remaining problems that
users have reported when trying to load large maps. When loading an image,
there are three things which take time: allocating the original BufferedImage,
decoding the pixel data, and converting the original BufferedImage into a
usable type.

I’ve done some tests to see if we can avoid the conversion step by telling
ImageIO to decode the pixel data directly into the type of BufferedImage we
want to use. That works, but is slower (!) than creating the first
BufferedImage (which will often be TYPE_CUSTOM) and doing the conversion.
I don’t see any way to shave time on this.

Image decoding is something where we can potentially save a lot of time. The
amount of time it takes to decode image data and create a BufferedImage from
it is about 10x more than it takes to create a BufferedImage from raw pixel
data written to disk. This means that if we read images in once using ImageIO
and then wrote them back to a disk cache as (compressed) raw pixel data, we
could load images about 10x faster on all loads after the first one. Since a
disk cache can be persistent, this would mean we could have such savings on
all runs of the module after the first. The tradeoff here is a bit of disk
space for better performance. (Two examples: the raw pixel data for non-piece
images in the new 1805 module is 39MB; for Case Blue, it’s 110MB.) I’m
guessing that disk space is not in short supply for anyone these days; anyway,
it should be much more plentiful than RAM or CPU cycles.

What I have kicking around in my head right now is that we could do the
following:

The first time a module is loaded, load each large image and write the raw
pixel data for each tile out to the image cache. This is something for which
we could show a progress dialog (and I think we could even be pretty accurate
about the progress). I did some timing, and found that for the 1805 module, it
took about 150s to tile all of the map and chart images and write them out to
disk; for Case Blue, it took 452s. This is a one-time cost, and since it’s
something which I think will have a huge impact on load time, I think most
users will find it worthwhile. Having all of the unscaled tiles available
from the disk cache would make it unnecessary to ever load the original image
again—so for modules with large maps, that’s a huge savings, both in load
time and in continued RAM usage. (And it would make the in-memory image cache
more flexible, since it wouldn’t contain anything larger than the tile size.)

Before I put time into the design work for this, I’d like to hear whether
anyone sees problems with this, has suggestions, etc.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

–
Michael Kiefte, Ph.D.
Associate Professor
School of Human Communication Disorders
Dalhousie University
Halifax, Nova Scotia, Canada
tel: +1 902 494 5150
fax: +1 902 494 5151

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · August 31, 2009, 8:33am

Thus spake Michael Kiefte:

I think we’re seeing some of these probelms because of modules which
have high-resolution maps which are intended to be “normal size” at
a scale factor less than one, combined with people trying to run them
on machines which don’t have enough RAM. E.g., the 1805 map is really
not huge—it’s one mapsheet—but the map image is something like 5k
pixels on a side. I expect than anyone playing will not use the 100%
zoom level. I hadn’t seen any examples of this until recently.

As for inefficiencies in Java: Image decoding isn’t something that I
expect could be done a lot faster. The one place where Java forces
you to do extra work is in making a fresh copy of BufferedImages for
which you’ve touched the raw image data, since they can’t be
accelerated after that. It’s irritating that there’s no way to tell
Java that it’s safe to use an accelerated surface for an image again,
because you’re not going to touch the raw image data anymore.

You might be able to find some other image loading libraries which are
marginally faster, but that’s not going to help much when you’re short
on RAM and you’re trying to load a 5000x5000 bitmap. The best solution
I can see is not to load that bitmap at all.

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · August 31, 2009, 11:55am

Thus spake Michael Kiefte:

I think there’s also an issue of selection bias here: The users complaining
now were also complaining before, but they were a smaller proportion of the
total complaints then. The users for whom the changes we’ve already made
solved the problem aren’t complaining anymore, so this creates the illusion
that an old problem is new.

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · September 13, 2009, 11:06pm

I’ve uploaded a test build where map tiles are precomputed and stored in a disk cache the first time you load a module. This takes a long time for some modules—for Case Blue, you should go make some tea. Once the tiles are built, however, I’m finding performance to be greatly improved.

(When you’re finished, you’ll want to delete the tiles directory in your VASSAL config directory.)

Try 3.1.11-svn6014 and let me know what you think of this change. (The tile cache modifications are in this build only; they’re not yet committed to the repository.)

nomic.net/~uckelman/tmp/vassal/

Brent_Easton · September 14, 2009, 12:18am

The Devil’s Cauldron took 10 minutes on a fast quad processor with 4Gb of memory. It is a strong disincentive to downloading a module and ‘having a quick look’. Especially as most scenarios in TDC only use a small percentage of the maps and charts that had to be converted. Worst effected will be us developers who often have to download and run a new module to test a bug.

Is it not possible to build up the Tile cache as the map tiles are actually required and used?

B.

*********** REPLY SEPARATOR ***********

On 13/09/2009 at 4:06 PM uckelman wrote:

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · September 14, 2009, 10:29am

Thus spake “Brent Easton”:

It would be possible to do it that way. I’m affraid that filling the tile
cache for each image on the first request would be annoying for modules
where almost all large images are used in every setup. For example, if
it were like this for Case Blue, on the first load you’d have to wait
for 2-3 minutes multiple times—and this would always happen right when
you’re trying to scroll the map—rather than 10 minutes once, before
you get involved in using the module.

I thought it would be better to take Machiavelli’s approach, and have
all of the unpleasant waiting happen in once shot, rather than repeatedly.

The reason that I’m having it go through the entire images directory to
tile any image larger than 256x256 is that there is no way to tell, for
any given image, whether it will be used in a scenario prior to it being
requested for display in that scenario. I seem to recall complaining about
this before, that we cannot query components to find out what images they
use. If we could ask a saved game which images it uses, then we could
avoid having to tile everyting on the first load. This would reduce the
time to fill the tile cache for individual scenarios for modules like TDC.
It wouldn’t change anything for modules like Case Blue, where all of the
map images are always in use.

I might be able to squeeze a bit more speed out of writing out the tiles.
Right now I’m writing individual pixels with a DataOutputStream, which
has no support for writing out an int. I wouldn’t expect this to be more
than twice as fast, though.

Did you find that performance was better after the tile cache was built?

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · September 14, 2009, 8:09pm

Thus spake “Brent Easton”:

DataOutputStream is even worse than I thought it was. By chucking it and
writing the pixel data myself, I can make this 4.2x faster. Would you find
a wait of 2-3 minutes acceptible for loading TDC the first time?

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · September 14, 2009, 11:25pm

Thus spake “Brent Easton”:

Try the 3.1.11-svn6016 build I just uploaded. Is this significantly better
for you? (Remember to delete the tile cache for TDC before you load it.)

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

Brent_Easton · September 15, 2009, 2:13am

Huge improvement at 1 minute, 10 seconds. Quite acceptable as a one-off cost for a large module like this. Some other timings out of interest

Memoir '44 - 15s
V40K - 45s
VSQL - 5s
Carcasonne - <1s
South Mountain - 8s
Liberty Roads - 12s
Dungeoneer - 15s
1805 Sea of Glory - 10s

For the average module, it is barely noticeable at < 15s. Just one thing, the progress bar goes up to 100% more than once while it is processing the files.

B.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · September 15, 2009, 9:01am

Thus spake “Brent Easton”:

Good. DataOutputStream is unbelievably awful. It calls write() four times
in order to write a single int, instead of writing the int to a byte[4]
and calling write() once. That’s why it’s about 4x slower than it needs
to be.

Great!

The first time it goes to 100%, it’s showing the progress of finding the
images in the archive which span multiple tiles . I could change the bar
to be indeterminate at that point. Would that be better?

Related things:

We can remove memory-mapped images now; this makes them unnecessary. Yay!
We need a way to make sure that the disk cache doesn’t get stale. Right
now, if you change an image in a module, the cache will still show you the
old tiles (unless you change the module name or version at the same time).
We might want to think about the size of disk caches. E.g., TDC produces
a 225MB disk cache. Because we test many modules, we’re going to end up
with a few GB of tiles on disk. This might be a non-problem. I’m not short
on disk space. I doubt that you are, either. (It sounds from your description
earlier in this thread that you have a rather new machine.) Most users will
not have 200+ modules at once. Still, clean-up or size limiting bears
thinking about.

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

jimpyle · September 15, 2009, 11:41am

Joel,

Am I the trouble maker who started this extra work?

If you remember, I posted on CSW that I was having significant delays in opening modules using 3.1.10 versus 3.1.9.

If so, would you want me to give this new way a try? Need place to down load it if you do.

Later,

Jim

Brent_Easton · September 15, 2009, 11:47am

Yes, that would be less confusing.

Excellent. That is a confusing option.

How about basing it on the module CRC? There is a routine already to calculate it.

Yes. I have no problems, but we will definitely need to give users some control here. The issue is very similiar to a web browser cache, these usually have the following:

Location (you might want to shunt it off to a spare disk)
Option to clear all
Option to clear specific modules ?
Overall Size limit (ala Internet Explorer cache)

B.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · September 15, 2009, 12:08pm

Thus spake “jimpyle”:

Yes, but it’s not really extra work. I’ve been thinking about adding a disk
cache for quite some time. My first demos for testing disk cache speed are
nine months old now.

Until we move to the new web site, I upload all of my test builds here:

nomic.net/~uckelman/tmp/vassal/

The one you want is 3.1.11-svn6016. Keep in mind that this is not release-
quality. (E.g., if you cancel the building of the tile cache for a module,
you won’t be able to load the module after that unless you manually delete
the partially-built tile cache. So don’t do that.)

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · September 15, 2009, 12:43pm

Thus spake “Brent Easton”:

I was thinking about using the CRC, the uncompressed size, and the date,
all together. (My resaon for not using the CRC alone is that there are
only 2^32 distinct ones, so if we ever had a collision there would be no
workaround for the user. The modification date is independent of the data,
so collisions can be avoided by simply rewriting the ZIP archive.)

We don’t need to calculate the CRC ourselves, since the ZipEntry has that
already. I guess all we’d need to do is store a list of with the filename,
CRC, uncompressed size, and modification date of each image file along
with the tile cache, and then check that the module matches this at load
time, regenerating the tiles for any image which has changed. (That should
be fast, probably less than 1 second if the moduule is unchanged.) As an
even faster check, we could look at the file size and modification date
of the whole ZIP archive—if those match, there’s no point in looking
inside the archive for mismatches. It would be extremely unlikely to have
a ZIP archive where one of the contained image files has changed but it
still compresses to exactly the same size.

In the event that this all fails, we should give the user a way to kill
the existing cache—a button in the prefs would be good for this.

Ok. I think being able to clear the cache for a particular module is
important. I, for one, would be unhappy if the cache getting screwed up
for one module meant that I had to blow away the caches for all 200+
modules I have sitting around.

As for an overall size limit: That will require a bit more infrastructure
to achieve, as then adding a new tile into the tile cache for one module
might cause a tile for some other module to be erased. Rather than have
missing tiles be regenerated on the fly (which could cause potentially
huge images to be loaded at unexpected times), maybe we should regenerate
missing tiles when the module is loaded, and guarantee that tiles will
disappear from the cache only then, and only for modules which are not
already running. That will mean that we would take the cache size limit
as a soft bound, since we might be forced to exceed it in order to
maintain the tile caches for all running modules.

One more thing: Should we be targeting this at the next 3.1 release, or
at 3.2? If the latter, then maybe we should try to get 3.2 out the door
with only the features it has now.

As for planning: I gave my dissertation to the committee on Friday,so
already I’m having dramatically more time to spend on VASSAL; that should
continue for the forseeable future. This means that I’ll have time now
to finsih the work on the new site, fix the things in the trunk that
you’ve (very patiently) waited months for me to fix, and set up the test
Jabber server. If you have any opinions about what order I tackle this
stuff, let me know.

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

tar · September 15, 2009, 5:03pm

On Sep 15, 2009, at 2:01 AM, Joel Uckelman wrote:

Either that or use the option to also add text to the ProgressBar to
identify what it is doing:

“Locating Images”
“Writing Cache”

That way it would be clear that there are two separate operations that
are being performed. This has the advantage of avoiding the
indeterminate state and will also likely have the effect of making the
processing seem faster to users.

Yay!

Could this be done using module file date/times?
If the module is newer, then regenerate? Presumably one would want to
use the folder timestamp or else a small file in the cache so you
don’t have to look at all of the cached files.

I think a size limit makes some sense.
At least that is the approach that web browsers have taken.

There could also be a time-related cleanup function, too. But that
would need some way to track the last time a module was loaded.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

jimpyle · September 15, 2009, 6:14pm

Joel,

3.1.11-svn6016 worked like a champ. After that initial build the first time opening a module, seemed to be very similer to 3.1.9.

Thanks!!!

Jim

Brent_Easton · September 15, 2009, 11:57pm

Yes, that is what I was thinking. The cache for a single module may exceed the cache size. I would maintain the cache on a module by module basis rather than an image by image basis. You always maintain all files for a given module, and maintain enough modules until the limit is exceeded, deleting LRU modules as reqiuired.

Depends on how long you want to wait. I would be tempted to say include it in 3.1. It doesn’t add any module functionality, so shouldn’t break any modules and will make things a lot simpler. 3.2 is going to require a lot of testing and debugging and several of my features need ‘polish’. It will take plenty of time even with adding minimal extra functionality. I would like to set a definite plan for what we want to include in 3.2 and work towards that for a New Year release perhaps?

I’d like to see the Jabber server available for testing asap. Thet will be a major change which needs extensive testing. Plus, I have a pile of ideas for room control etc.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · September 16, 2009, 12:06pm

Thus spake “Brent Easton”:

I’m moving the planning discussion to its own thread, to keep it separate
from the tile cache discussion. Reply to follow, there.

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · September 21, 2009, 12:43pm

Thus spake “Brent Easton”:

This is turning out to be a bigger job than I thought, because DataArchive
supports pretty much none of the operations I need in order to check for
cache freshness:

DataArchive doesn’t provide any file metadata itself, so I can’t find out
the size or modification date of any images it contains—unless I get the
name of the ZIP archive from it, and then open the ZIP archive myself.
Extensions can contain images which need to be tiled, but DataArchive
doesn’t expose the list of extensions so I can query them—so for
extensions, I can’t even find out the names of the ZIP archives myself.
Unsaved files added to ArchiveWriter (which extends DataArchive) aren’t
even in any ZIP archive yet, and it won’t tell me where they actually are,
so there’s no way to check if they’ve changed while editing a module.

I think this means that finishing the disk cache will require finally killing
off DataArchive—something which is about half-done in the trunk already.

I’ve been looking at are the NIO additions which will be in JDK 7:

java.sun.com/developer/technical … avase/nio/

The Path and FileSystem abstract classes are just what we need for this;
since they’re fully abstract, it’s simple to backport them, and then we can
just chuck the backports sometime in the future when we require Java 7.

What I’d do is turn the ZipArchive class I created in the trunk (to be a
DataArchive replacement) into a FileSystem subclass, and then have another
subclass of that which glues extensions together. That would be sufficient
to get all of the functionality that the DataArchive provides presently,
that the ZipArchive provides in the trunk, and that I need for the disk
cache.

This is more changes than I feel comfortable putting into 3.1.11, so now
I think this should go into 3.2. Practically, that only means that I’ll
rebase what I’ve already done to the trunk and keep working on this. As
a result, the ZipArchive problems you reported will get resolved sooner
rather than later, since I’ll fix those along the way. Then, once I’m
clear of this, back to the web site.

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)