Making matters worse?



Today we'll try to answer the question of whether the WCB diagnostic stack check increases the likelihood of a crash occurring.

If we change the BLE L_52EF instruction at $52b4 to a B L_52EF we neuter the test, diagnostic screens will never be drawn and the game should crash in some other way.  To check this we can repeat our test of 100 crashing games to see what happens.

Without the stack size check in place only 11 of the 100 games that originally generated the diagnostic screen now crash (black screen), and another 28 freeze before the 9th innings (games may also end early in the 9th, but there is no easy way to distinguish these from games that correctly play through to the end).  Oh dear! So the stack check does seem to increase the likelihood of the game ending early by more than a factor of two.  Not good.

If the game does still crash, the stack high tide is one of either $31d or $320, caused by the PSHR R5 at $143f.  If the game freezes, the high tide mark always seems to have a value of $31f, also generated by the code at $143f.  Finally, if the game plays to a conclusion, the high tide mark is between $319 and $31f which is established either by the code at $143f or $1669.  Notice that these stack values are all significantly lower than the $32b to $330 seen when the check is in place.  Also, the actual high tide stack value does not seem to be a good discriminator as to whether the game will crash, freeze or play to the end.

It should be noted that these results are only definitive when playing computer vs computer games at level 3 on JzIntv.  The stack usage may be different at other AI difficulties, or if no AI is present, and this may make matters better, or worse.  In reality I suspect that Mike's priority was games with at least one human player, however, these are much harder to test,  as controller input will be a source of entropy in the random number generator.  This would necessitate the construction of dedicated hardware (think Ben Heck's Dragster replay hardware) to make games repeatable and crashes reproducible.

So what have we learned?  Well, it looks as though Mike was correct that the problem is caused by a stack overflow.  Whilst his test to monitor the stack does capture the vast majority of cases, there is evidence that it might be too aggressive and raises false predictions that the game will crash.  If INTV wanted to reduce the incidence of crashes one possible simple and effective thing would have been to remove the stack check from the production code.  That said, Mike's check does seem to get rid of all game freezes by converting them to diagnostic screens.  Perhaps INTV thought that such freezes were more annoying for players than the diagnostic screen and decided to keep the check in to banish them.

Next I think we will take a look at the anatomy of one of the crashes and see if we can work out what is going on.

Comments