I’ve been noticing that my Module Manager is taking longer to load now
that it had been, and I finally ran it in a profiler to find out why:
The time it takes the MM to open is proportional to the number of modules
it knows about, and I keep accumulating more modules in my library as
people ask me to test things. Right now, I have 22 modules listed in the
MM, and almost 20% of the time-to-open is consumed by reading metadata
from the modules. This is only going to grow as time goes on…
Most of the modules in the user’s library won’t change between uses of
VASSAL, so module metadata would be a good candidate for caching. The
MM could load as much as 20% faster if we cached module metadata instead
of rereading the metadata each time VASSAL is started.
I think we could do this reliably just by storing module file sizes with
the cached metadata, since sizes of ZIP files are going to be very sensitive
to changes in their uncompressed contents, and it’s far cheaper to check
file size than it is to open a ZIP file and read and parse the metadata.
I was kind of hoping we could avoid doing all of the disk seeks completely.
If the only thing which mattered was CPU time (instead of wall-clock time),
then what you’re suggesting would be a good solution. The two things which
together comprise 92% of the CPU time used by loading metadata are (1)
getting the XML reader from the XMLReaderFactory (62%), and (2) parsing
the XML (30%). Come to think of it, if we can reuse the XML reader, then
we can cut out almost 2/3 of the CPU time. Hmm.
(I should qualify what I said earlier about the percentage of time-to-load
which reading module metadata takes: 20% is 20% of CPU time, not wall-clock
time. We’d probably save even more wall-clock time, if we could avoid most
of the disk I/O.)
No, please don’t. The back-and-forth is genuinely useful. Your probing
forced me to look at what exactly was consuming CPU time. Being challenged
on points like this is an excellent way to discover things. Thanks!
I checked in a change in trunk@3942 which has all of the subclasses of
AbstractMetaData using a common XMLReader to avoid the overhead of creating
a new one each time metadata is read. This reduces the CPU time by about
5% overall, which is a start, but not really perceptible.
(Note: Something which might help a lot would be trying to read metadata
on multple threads. If we do that, we can’t use the synchronized common
parser which I introduced, becuase that would cause all of the threads
to run sequentially, each one waiting to get the lock in turn. What we’d
need to do instead is create a pool of XMLReaders, from which we hand
out unused ones to threads which request them. This is probably not worth
the effort, though, since the speedup from caching would dwarf the speedup
we might get from multithreading here.)
As for the caching thing: Brent, I encourage your to look at this for 3.2,
but now that we know what the problem is, maybe we can put this one off
in favor of squashing the remaining 3.1 bugs?