About bug 12554

No major surgery, instead leave the old way and put the new way at its side and allow user to select between old and new way: yes this is what I’m aiming for, this is what I know as common practice in dealing with legacy code.

Now, forgive me for being direct.

Too risky: yes, I understand why the code deteriorated so much, instead of crying about how bad things are and making plans for a complete rewrite in this glorious so-much-better-than-java language, it would have been your job to DO major surgery, and if you don’t get rid of this “boohoo it’s too risky we are shitting our pants” mindset the exact same deterioration will happen to your new Vassal.

Stop messing around, just try this and that: if you know how to solve it, or want to try a certain fix, why not do it yourself, it is open source code. I haven’t understood the problem yet, and I am not ready to solve it at this point. A wise man once said, in order to solve a problem, one has to understand the problem first. This is why I am doing the refactoring, the major surgery, I don’t do it for keeping the results, for now I just do it to understand the problem. Once the problem is understood, we can still talk about how to solve it and how to package the fix.

“No-one has actually put their finger on exactly where this is going wrong” - either I am going to put a finger on this, or I will not fix this bug at all.

So effectively move the “apply key” part from before tracker.repaint() to after tracker.repaint() and hope this is a map repaint problem? And the clean way of doing this would be to simply move the tracker.repaint() 6 lines up and put it before the “apply key” part, not to extract the whole “apply key” into the caller method.

I even tried it, nothing changed. Any more shots in the dark or can I proceed with a proper analysis? It is hard and time-consuming as it is, the code seems to not have been touched since Java’s pre-1.2 times, since 1998, I just almost got a heart attack when I saw what kind of manual array acrobatics the Stack does with its contents. Was is that hard to replace this with a simple ArrayList which has existed for 22 years…

Ah Well. I’ll just shut up now. You’ve been working on the project for what, 5 minutes and have it all nutted? Good luck to you.

I agree - closest I’ve come so far was tracing through and discovering that upon undo the commands are being processed in the apparently-correct order – that the piece is being “moved back” to the correct set of coordinates, but for some reason doesn’t appear to us as being back.

Which does suggest that Send To Location could be “missing something” – for example it isn’t doing an “add piece” the way the PieceMover does. Add Piece (and here is where my understanding of the architecture isn’t complete enough yet) seems to relate to the stacking functions, and so the fact that it’s being left out certainly raises a suspicious flag.

Hopefully I will have time to dive in another around before the weekend is done.

Brian

Thus spake Flint1b:

No major surgery, instead leave the old way and put the new way at its
side and allow user to select between old and new way: yes this is what
I’m aiming for, this is what I know as common practice in dealing with
legacy code.

Now, forgive me for being direct.

Too risky: yes, I understand why the code deteriorated so much, instead
of crying about how bad things are and making plans for a complete
rewrite in this glorious so-much-better-than-java language, it would
have been your job to DO major surgery, and if you don’t get rid of this
“boohoo it’s too risky we are shitting our pants” mindset the exact same
deterioration will happen to your new Vassal.

This “deterioriation” you’re talking about never happened. Going on about
it makes it hard to take seriously what you’re saying. As I have already
explained, we inherited the code you’re referring to, and it was mostly
written that way to begin with. What you’re seeing now is the where we’ve
reached after many, many attempts to rectify things. You didn’t see what
it looked like in 2006.

I was extremely reluctant to come around to the idea that we needed to
start over. There are old posts in the forum where I explained a plan for
doing model-view separation. Then I tried it, and realzied that keeping
things working while doing that was far, far more work than writing
something new with tests validating that it does what it is intended to
do.

Furthermore, there’s no way to ensure you haven’t broken existing modules,
saves, and logs when you don’t have a spec for what anything is intended to
do. We’ve fixed many bugs over the years without that because some bugs are
so obviouslly wrong that they violate any reasonable intent, so we don’t
need to know exacty what the actual intent was in order to fix them.
On a few occasions, we’ve fixed something to find that so many modules
dependend on it being broken that we had to leave it as it was. You might
be surprised by how angry users can be when their saves and logs stop
working.

Stop messing around, just try this and that: if you know how to solve
it, or want to try a certain fix, why not do it yourself, it is open
source code. I haven’t understood the problem yet, and I am not ready to
solve it at this point. A wise man once said, in order to solve a
problem, one has to understand the problem first. This is why I am doing
the refactoring, the major surgery, I don’t do it for keeping the
results, for now I just do it to understand the problem. Once the
problem is understood, we can still talk about how to solve it and how
to package the fix.

“No-one has actually put their finger on exactly where this is going
wrong” - either I am going to put a finger on this, or I will not fix
this bug at all.

I want to be clear that we’re talking about the same thing here:

Change whavever you need to change while debugging. It’s local to you. If
refactoring some class helps you figure out what’s wrong, great.

We have to be more conservative about what goes into releases. It is a
massive waste of time and goodwill to put out releases which break things
for users. If we’re going to do refactoring which could affect module
behavior and have that go into master, we need to be very sure that it
won’t cause a mess.

And, in general: Chill a bit. It’s not necessary to be so aggro. You
think we’ve done a bad job. I’ve explained that we’re a long way ahead
of where we started. We don’t agree on these points and I suspect that
we won’t. That need not prevent us from fixing this bug.


J.

Possibly relevant other Undo issues:

https://forum.vassalengine.org/t/undo-problem-captured-in-log-file/7622/1
vassalengine.org/tracker/sho … i?id=12951
https://forum.vassalengine.org/t/test-builds-for-3-3-0/10056/91
https://forum.vassalengine.org/t/test-builds-for-3-3-0/10056/170
https://forum.vassalengine.org/t/test-builds-for-3-3-0/10056/195

Several of these links are for a problem which was reported against some earlier 3.3.0 test builds C&C which had a fix for Bug 12538 that had some unintended consequences for Undo. (I still don’t grasp why these changes cause a problem, as they look right to me.)

Check this out… I’m trying to figure out why PieceMover generates an “AddPiece” but the SendToLocation does not. Biggest mystery is in the last code fragment.

So in the 12554 instance, the PieceMover is running the following code to for its part of the move (line 570 of the 3.2.17 PieceMover):

if (mergeWith == null) { comm = comm.append(movedPiece(dragging, p)); comm = comm.append(map.placeAt(dragging, p)); if (!(dragging instanceof Stack) && !Boolean.TRUE.equals(dragging.getProperty(Properties.NO_STACK))) { final Stack parent = map.getStackMetrics().createStack(dragging); if (parent != null) { comm = comm.append(map.placeAt(parent, p)); } } }

The second part of the move from SendToLocation calls into Map.placeOrMerge and runs:

    Command c = apply(new DeckVisitorDispatcher(new Merger(this, pt, p)));
    if (c == null || c.isNull()) {
      c = placeAt(p, pt);
      // If no piece at destination and this is a stacking piece, create
      // a new Stack containing the piece
      if (!(p instanceof Stack) &&
          !Boolean.TRUE.equals(p.getProperty(Properties.NO_STACK))) {
        final Stack parent = getStackMetrics().createStack(p);
        if (parent != null) {
          c = c.append(placeAt(parent, pt));
        }
      }
    }

So that’s ALMOST the same logic. What’s different is the existence of the call to “movedPiece” in the first version before the placedAt-ing starts.

The movedPiece does this:

    setOldLocation(p);
    Command c = null;
    if (!loc.equals(p.getPosition())) {
      c = markMoved(p, true);
    }
    if (p.getParent() != null) {
      final Command removedCommand = p.getParent().pieceRemoved(p);
      c = c == null ? removedCommand : c.append(removedCommand);
    }
    return c;

But I don’t see anything in there that’s raising red flags for this (although it’s interesting that SendToLocation version apparently isn’t checking to see if piece needs to be “removed” from an earlier parent – but I don’t think we’re seeing a pieceRemoved command sequence in our final command string anyway)

So given that both versions seem to ultimately call into Map.placeAt(), what I’m now trying to figure out is why the call from PieceMover seems to generate an AddPiece command (as well as a MovePiece) command, but the SendToLocation call does NOT:

  public Command placeAt(GamePiece piece, Point pt) {
    Command c = null;
    if (GameModule.getGameModule().getGameState().getPieceForId(piece.getId()) == null) {
      piece.setPosition(pt);
      addPiece(piece);
      GameModule.getGameModule().getGameState().addPiece(piece);
      c = new AddPiece(piece);
    }
    else {
      MoveTracker tracker = new MoveTracker(piece);
      piece.setPosition(pt);
      addPiece(piece);
      c = tracker.getMoveCommand();
    }
    return c;
  }

See they both call Map.addPiece() but we’re only actually getting an AddPiece command on the PieceMover version. Could we be seeing a problem in (gulp) MoveTracker? That’s a rabbit hole I haven’t been down yet, I’ve just been “assuming it works”.

Or actually, the TOP part of the “if” DOES manually insert an AddPiece into the command, but the bottom part doesn’t. So for some reason PieceMover apparently goes through the top part and the SendToLocation goes through the bottom part? That seems… odd. I would have expected them to both go through the bottom part.

And I’m still left wondering what the AddPiece is “for” and why it is only put in the sequence sometimes.

Any insights will be received with great gratitude!

Brian

I’m coming to realize that the “AddPiece” we’re getting is actually the creation of the “parent” piece for a stack. But I’m looking at those first two code fragments above and baffled why one of them is creating a parent for a stack and the other isn’t?!?!

This is the second time I’ve composed a message that disappeared in the ether. I have some information that may be helpful.

Earlier I had reported my version of the undo bug where I send-to pieces from zone to zone. My problem is not at undo time. Mine is when I go forward. The undo poisons later behavior. But it might provide a clue. I created a small 6k mod that 100% shows a problem if anyone cares to see it.

In my mod, I have 6 numbered zones. If I press a button, pieces advance to the next zone, so a piece in ‘2’ goes to ‘3’. I must have at least 2 pieces in zones to have a problem.

After the undo, a piece in ‘2’ is supposed to go to ‘3’, but instead it jumps to ‘4’, joining its brother who came from ‘3’. I query the piece, and it tells me its OldZone was ‘3’. No, it’s supposed to be ‘2’. OldZone is wrong.

So I surmise the undo caused the bad piece to use the other piece’s OldZone. Since state variables tend to be clustered in classes and structures, it suggests to me that not just OldZone is wrong. I surmise that during the undo, the bad piece’s state got linked to someone else’s. If true, then my observation is just a tip of an iceberg, and it be a door to other undo problem explanations.

If it were me (and I’m retired and keep bees), I’d run it in a symbolic debugger and watch specifically what happens.

Oh, but meanwhile I’m getting even more convinced that it is a Bad Thing (or at least “an Unintended Thing that is now causing this problem, and we can all begin our Futile Prayers That Some Modules Won’t Turn Out To Depend On This”) that when a piece move runs through an Undo sequence that there is NOT a new “stack parent” being created to receive it back at its old position.

Oh yeah actually from the code I just traced through they would be TOTALLY wrong. Because it isn’t appending command sequences to set the “OldLocationName” and “OldZone” flags – it’s just writing them right there on the fly. Meaning that they DON’T get undone properly (or at all) when a unit is un-done to a different location, and therefore if it then checked its OldZone and stuff they would be incorrect for its current position.

Now mind you, I always thought of OldZone (et al) as things I could only count on being right “immediately after a move”, in other words while processing the key-after-move or in an auto-report string. So IF you’re checking them in a situation where they haven’t just been moved (e.g. you’re looking at them during a right-click menu item, when you haven’t just done a Send To Location on them in this command sequence) then I think you may be using them “off spec”.

So that tells me you’re seeing A Separate Bug (or a Separate Alleged Bug if it turns out you’re using the traits a bit off-spec). The fix here would be we’d need to incorporate the changes to the OldZone (et al) into the Command sequence (so that they would undo properly) rather than just writing them directly.

Bugs 2, Me 0…

Brian

Okay, I think I’ve stumbled upon a new bug that may or may not be related to undo. In my test module, I decided to dump OldX and OldY too. I started moving my pieces around from zone to zone before I undo’d, and it failed! At some point, two pieces in adjacent zones stacked. The one in ‘2’ decided to skip ‘3’ where it was supposed to go, and it jumped to ‘4’ to stack with its brother. No undo involved. WTH!? These zones are squares, 200 pixels on a side! The stacking area of effect can’t be 200 pixels, can it?

To check, I turned off stacking in the map, and now it works, and it works even after an undo. In my main module, it always failed after an undo, but perhaps it has to do with this stacking bug, and I just read it wrong. I’ll check in a moment.

So my apologies for leading everyone astray. I was convinced this was an undo problem. Undo may have its hands in it, but I can’t prove it. Sorry. I’ll go now.

Okay, I’m back. One last time, I promise. I revisited my main module and disabled stacking in my player map. When tested, it all worked correctly. Undo did not fail. Perfecto.
SOoooo, if you have a test case that fails undo, just for laughs, try disabling stacking and see what happens.

The 12554 case ALSO works right w/o stacking. (The fact that the stacking code is WAY more complicated than the non-stacking version makes it ultimately small surprise that it is implicated in all this). Good information though, thanks! Brian

Brian, The main difference is that SendToLocation is calling Map.placeOrMerge() which is supposed to be a one-stop shop to move a piece and handle placement/merging/new stack creation etc. on the destination map at the destination map co-ords, rather than having to handle all of that ‘manually’ as the PieceMover does.

The fact that turning off stacking fixes the problem points to placeOrMerge() not handling the new stack creation correctly. On a map with stacking enabled, non-Do-Not-Stack pieces are not allowed to exist in a raw state, they have to be part of a Stack, so any movement to an empty location on a stacking map needs to first create and place an empty Stack at the location, then merge the moving piece into the new empty stack.

But Map.placeOrMerge() is the second of the two code-fragments I listed above – it appears to have code near-identical in meaning to the first code fragment. And specifically it looks like it (should(?)) be doing a createStack() and should(?) be appending a placeAt() result to the command, which would mean another AddPiece in the command sequence.

I think I’m giving up on this one guys. I’ve already got well over 40 hours of research and debug-tracing into it and I’m not really making headway anymore toward being able to propose some kind of fix.

I’ve left most of the research I’ve done on it attached to the bug, and if someone who understands the underlying mechanics of stacks & commands better wants to have a crack at it then I’m certainly happy to bounce ideas back and forth if useful.

It’s a bug I would really really like to see fixed, because all of the modules I use to play suffer from it, and I was really really hoping to be able to contribute to it, but I just can’t find a way forward.

Thanks for your help Brian, you’ve helped narrow down where the problem is.

Brian, just as an aside, is there a reason you don’t just turn ‘Snap to defined point?’ on in the irregular grid you’ve created? You don’t need the Send To Location then.

Alternatively, I usually handle these by creating a Rectangular grid instead of an irregular grid, adding a gridnumbering, aligning the Column number with the Track number and setting the Location Format to $column$ and turn on Grid Snapping. Save loads of work.

Rgds.

Quick report on the current state of things. I traced the source of the bug up to the DragBuffer.

The “thing” being actually moved is a ReportState object.

The very first move by the user puts the wrong thing into the DragBuffer, it puts the Stack there, which contains the ReportState. It looks like things work fine if the actual ReportState is in the DragBuffer, and not the Stack that contains the ReportState.

If I replace this Stack in the DragBuffer with the ReportState contained in the Stack, then everything seems to work. And this has to be done only once, afterwards all further move and undo actions work perfectly fine.

Example, the log is from the method DragBuffer.add() which fires when the user clicks and holds the counters:

  • The very first move by the user, observe how the DragBuffer contains a Stack:
2020-06-01 09:52:28,048 [0-AWT-EventQueue-0] DEBUG VASSAL.counters.DragBuffer - dragBuffer.add: VASSAL.counters.Stack@6db2a5c0[Blockade Marker]
  • I correct this during debugging by just replacing the Stack in the dragbuffer with the ReportState object that is inside the Stack
  • Then comes the undo which works fine
  • Then I can move the counter forward and undo, as many times as I want, every time the DragBuffer contains the correct object and nothing needs to be fixed anymore, observe how the DragBuffer contains the ReportState:
2020-06-01 09:55:21,425 [0-AWT-EventQueue-0] DEBUG VASSAL.counters.DragBuffer - dragBuffer.add: VASSAL.counters.ReportState@51a27fc[name=Blockade Marker,blablamorestuff]

Next questions I want to solve:

  • who puts the wrong object into the DragBuffer for the initial move, and why
  • why, once fixed, do the moves 2-N work correctly, why is the correct object put into the DragBuffer, what is different between move 1 and moves 2-N
  • is there any unwanted side effects by putting the ReportState into the DragBuffer instead of the Stack

Also, I browsed the other posts in this thread, sorry for going a bit over the edge, I have very high standards where code automatically deteriorates if it is not cleaned up and refactored into a manageable state, and thanks a lot for the input from whoever else analyzed this issue, and I completely understand why fixing this in the form of a “dirty hack” is better, safer and more backward-compatible than a full heart surgery. Once I narrowed down the source of the bug and found a way to fix it, I will present the solution and then we can still think about how we package it, how much we change and how much we leave as it is. I think I am almost there, I can already fix the bug when running inside the debugger, now I just have to follow the call hierarchy and find the exact source of the bug.

One other observation I have made - Stacks seem to be created dynamically whenever they are needed, and this creating of the Stacks and the whole merging functionality seems to work well (enough), and this Stack stuff is not related to the undo bug.