why we get bug reports with truncated errorLogs

uckelman · March 26, 2009, 9:48pm

Our logger is not a lumberjack, and is not ok.

Presently, we’re getting a significant number of bug reports with a truncated
errorLog. This makes some (but not all) of these bug reports useless, as we
have no way of seeing what’s actually gone wrong. The errorLogs are sometimes
truncated because there is a race condition between writing new entries to
the errorLog and reading the errorLog from disk. Specifically, there is no
guarantee that the thread writing the errorLog (in the Module Manager) will
finish doing so before the thread which wants to read the errorLog (in the
component which experienced the error) starts reading.

I’ve thought of two solutions to this problem:

Have the thread writing the errorLog signal back (via the CommandServer)
when it’s finished, and make the reading thread wait for that signal.
Have each process write its own errorLog.

#2 avoids the interprocess synchronization problem, but the drawback here is
that we don’t get to see in the errorLog what else is going on (and sometimes
the context can be helpful for troubleshooting).

#1 is somewhat nontrivial, but I do know how to do it.

Does anyone have any other ideas before I go sink a whole day into this?

mkiefte · March 27, 2009, 3:24pm

2009/3/26 uckelman <messages@forums.vassalengine.org (messages@forums.vassalengine.org)>

Our logger is not a lumberjack, and is not ok.

Presently, we’re getting a significant number of bug reports with a truncated
errorLog. This makes some (but not all) of these bug reports useless, as we
have no way of seeing what’s actually gone wrong. The errorLogs are sometimes
truncated because there is a race condition between writing new entries to
the errorLog and reading the errorLog from disk. Specifically, there is no
guarantee that the thread writing the errorLog (in the Module Manager) will
finish doing so before the thread which wants to read the errorLog (in the
component which experienced the error) starts reading.

I’ve thought of two solutions to this problem:

Have the thread writing the errorLog signal back (via the CommandServer)
when it’s finished, and make the reading thread wait for that signal.

I thought if you kept a link to the reader thread in the writer thread, you could just wake it up. Otherwise, it just sleeps until it’s woken.

That’s probably not very helpful, but I don’t immediately understand why this is hard.

M.

Post generated using Mail2Forum (mail2forum.com)

uckelman · March 27, 2009, 4:37pm

Thus spake Michael Kiefte:

It’s because the reader thread and writer thread need not be in the same
JVM.

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

rk1 · March 27, 2009, 8:25pm

I vote for having separate log files for each process. For context, the automated bug reporter can post them all when submitting a bug.

rk

Post generated using Mail2Forum (mail2forum.com)

uckelman · March 28, 2009, 4:05pm

Thus spake Rodney Kinney:

I was about to agree with you when it occurred to me that this can’t
be done properly without ensuring that all of the individual errorLog
are flushed before the bug reporter reads them—but that’s essentially
the same problem as making sure that a single, combined log is flushed
before the bug reporter reads it.

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)

uckelman · March 30, 2009, 12:09am

Thus spake “uckelman”:

Our logger is not a lumberjack, and is not ok.

Presently, we’re getting a significant number of bug reports with a truncated
errorLog. This makes some (but not all) of these bug reports useless, as we
have no way of seeing what’s actually gone wrong. The errorLogs are sometimes
truncated because there is a race condition between writing new entries to
the errorLog and reading the errorLog from disk. Specifically, there is no
guarantee that the thread writing the errorLog (in the Module Manager) will
finish doing so before the thread which wants to read the errorLog (in the
component which experienced the error) starts reading.

I’ve thought of two solutions to this problem:

Have the thread writing the errorLog signal back (via the CommandServer)
when it’s finished, and make the reading thread wait for that signal.

I’ve implemented this as of 3.1.3-svn5413, which you can get here:

nomic.net/~uckelman/tmp/vassal/

I believe this works properly, but since I was fixing a race condition
which I was never able to produce myself, it’s not something I can test.

Summary of the changes: Logger.logAndWait() now forces the log queue and
all of the LogListeners to flush before it will continue on.

–
J.

Messages mailing list
Messages@forums.vassalengine.org
forums.vassalengine.org/mailman/ … engine.org

Post generated using Mail2Forum (mail2forum.com)