Presently, we’re getting a significant number of bug reports with a truncated
errorLog. This makes some (but not all) of these bug reports useless, as we
have no way of seeing what’s actually gone wrong. The errorLogs are sometimes
truncated because there is a race condition between writing new entries to
the errorLog and reading the errorLog from disk. Specifically, there is no
guarantee that the thread writing the errorLog (in the Module Manager) will
finish doing so before the thread which wants to read the errorLog (in the
component which experienced the error) starts reading.
I’ve thought of two solutions to this problem:
Have the thread writing the errorLog signal back (via the CommandServer)
when it’s finished, and make the reading thread wait for that signal.
Have each process write its own errorLog.
#2 avoids the interprocess synchronization problem, but the drawback here is
that we don’t get to see in the errorLog what else is going on (and sometimes
the context can be helpful for troubleshooting).
#1 is somewhat nontrivial, but I do know how to do it.
Does anyone have any other ideas before I go sink a whole day into this?
I was about to agree with you when it occurred to me that this can’t
be done properly without ensuring that all of the individual errorLog
are flushed before the bug reporter reads them—but that’s essentially
the same problem as making sure that a single, combined log is flushed
before the bug reporter reads it.