Anatomy of a crash pt.1




If we want to understand what is going on, it is important to only consider genuine crashes or freezes, rather than false positives generated by the diagnostic screen's stack pointer check.  Therefore, the first thing to do is disable the stack check and go in search of a crash or freeze that occurs nice and early in the game.  A quick crash will help by reducing both the amount of state in play and the time to test any theories.

In game number $2a4 we find just what we need.  WCB crashes in the first inning, before a single pitch has been thrown, or the batter has even made it to the plate!  In fact, things come to a grinding halt just about here...


At this point the stack high tide mark is already $31e and the CPU goes off into the weeds with the program counter having a value $402.  The fact that the game crashes before play has begun is really interesting.  It probably means that there has not been much opportunity for game specific state to exist.  Further, because "only" about 500,000 instructions have been executed it makes it possible to compare back-to-back the execution paths of this crashing game with one that plays this far without problem.

By monitoring writes to specific addresses made by a game that does not crash we can identify the lowest address used by the EXEC in WCB.  This was found to be $319, any stack writes to this address or higher risks corruption and crashes.  Because of the way the stack pre-decrements and post-increments its value on reads and writes respectively, this means that R6 equalling $319 represents the true "highest safe value" of the stack pointer.

Let's start by trying to understand how we end up with a PC value of $402.  Using JzIntv's ability to script running and trace instructions it is relatively easy to identify the culprit:

   6790 000B 000C 0000 000C 1289 02F7 14A9 -C--I-i-  MVI  $031C,R4
   6790 000B 000C 0000 0402 1289 02F7 14AB -C--I-i-  TSTR R4
   6790 000B 000C 0000 0402 1289 02F7 14AC -C--I-i-  BEQ  $135C
   6790 000B 000C 0000 0402 1289 02F7 14AE -C--I-i-  MVI  $011B,R0
   0001 000B 000C 0000 0402 1289 02F7 14B0 -C--I-i-  JR   R4
   0001 000B 000C 0000 0402 1289 02F7 0402 -C--I-i-  XORI #$FFFF,R7
  CPU off in the weeds @ PC == 0402, w = ffff
  instruction count: 531577

Although the stack pointer in R6 has a value of $2f7 at the moment of the crash, the high water mark prior to the crash was $31e. Things go west when the code at $14a9 loads the value in $31c to R4 and then it is then subsequently used as a return address at $14b0. In this instance the value read is $402. Looking back, this value resulted from the following small fragment.

   ; Load R2 from the address contained in R4 ($13) and
   ; fiddle with it a bit

   0001 0000 0002 002F 0013 0000 02F7 1274 -C-ZI---  MVI@ R4,R2     
   0001 0000 0002 002F 0014 0000 02F7 1275 -C-ZI-i-  SWAP R2        
   0001 0000 0200 002F 0014 0000 02F7 1276 -C--I---  RRC  R2,2      
   0001 0000 4080 002F 0014 0000 02F7 1277 S---I---  CLRR R3        
   0001 0000 4080 0000 0014 0000 02F7 1278 ---ZI-i-  RLC  R3,2      

   ; Load R1 from the address contained in R4 ($14) and
   ; fiddle with it a bit

   0001 0000 4080 0000 0014 0000 02F7 1279 ---ZI---  MVI@ R4,R1     
   0001 0004 4080 0000 0015 0000 02F7 127A ---ZI-i-  SWAP R1        
   0001 0400 4080 0000 0015 0000 02F7 127B ----I---  RRC  R1        
   0001 0200 4080 0000 0015 0000 02F7 127C ----I---  RLC  R3        
   0001 0200 4080 0000 0015 0000 02F7 127D ---ZI---  CMPR R5,R3     
   0001 0200 4080 0000 0015 0000 02F7 127E -C-ZI-i-  BNEQ $128B     
   0001 0200 4080 0000 0015 0000 02F7 1280 -C-ZI-i-  SLL  R1        
   0001 0400 4080 0000 0015 0000 02F7 1281 -C--I---  SLL  R2,2      
   0001 0400 0200 0000 0015 0000 02F7 1282 -C--I---  SWAP R2        

   ; Combine the values in R1 and R2 and write the result to $31c

   0001 0400 0002 0000 0015 0000 02F7 1283 -C--I---  XORR R1,R2   
   0001 0400 0402 0000 0015 0000 02F7 1284 -C--I-i-  MVO  R2,$031C

So the question is where does the value of R4, which is driving this code derive from. Backtracking further we see the following immediately preceding the fragment above:

   ; Load R1 from $31b and use this as an address to initialise R4

   0001 000B 0002 002F 6000 120B 02F2 1226 S---I-i-  MVI  $031B,R1     
   0001 000B 0002 002F 6000 120B 02F2 1228 S---I-i-  INCR R1           
   0001 000C 0002 002F 6000 120B 02F2 1229 ----I-i-  MVI@ R1,R4        

   ; Bit of fiddling

   0001 000C 0002 002F 000C 120B 02F2 122A ----I-i-  TSTR R4           
   0001 000C 0002 002F 000C 120B 02F2 122B ----I-i-  BEQ  $1217        
   0001 000C 0002 002F 000C 120B 02F2 122D ----I-i-  SDBD              
   0001 000C 0002 002F 000C 120B 02F2 122E ----ID--  MVI@ R5,R7        
   0001 000C 0002 002F 000C 120D 02F2 122F ----I-i-  PSHR R5

   ; Store R4 in $319 - interesting, if not directly relevant
   
   0001 000C 0002 002F 000C 120D 02F3 1230 ----I---  MVO  R4,$0319 

   ; More fiddling, which leads to the code we looked at earlier

   0001 000C 0002 002F 000C 120D 02F3 1232 ----I---  MVI@ R4,R1        
   0001 000C 0002 002F 000D 120D 02F3 1233 ----I-i-  ANDI #$0300,R1    
   0001 0000 0002 002F 000D 120D 02F3 1235 ---ZI-i-  BNEQ $1249        
   0001 0000 0002 002F 000D 120D 02F3 1237 ---ZI-i-  MVI  $011C,R0     
   0002 0000 0002 002F 000D 120D 02F3 1239 ---ZI-i-  AND  $0110,R0     
   0000 0000 0002 002F 000D 120D 02F3 123B ---ZI-i-  BEQ  $1249        
   0000 0000 0002 002F 000D 120D 02F3 1249 ---ZI-i-  ADDI #$0005,R4    
   0000 0000 0002 002F 0012 120D 02F3 124B ----I-i-  MVI@ R4,R0        
   0002 0000 0002 002F 0013 120D 02F3 124C ----I-i-  SARC R0           
   0001 0000 0002 002F 0013 120D 02F3 124D ----I---  BNC  $1261        
   0001 0000 0002 002F 0013 120D 02F3 1261 ----I-i-  BEQ  $1225        
   0001 0000 0002 002F 0013 120D 02F3 1263 ----I-i-  MVI  $011B,R1     
   0001 0001 0002 002F 0013 120D 02F3 1265 ----I-i-  ADDI #$0107,R1    
   0001 0108 0002 002F 0013 120D 02F3 1267 ----I-i-  MVI@ R1,R1        
   0001 0108 0002 002F 0013 120D 02F3 1267 ----I-iB  MVI@ R1,R1        
   0001 0001 0002 002F 0013 120D 02F3 1268 ----I-i-  ANDI #$00FF,R1    
   0001 0001 0002 002F 0013 120D 02F3 126A ----I-i-  BEQ  $1225        
   0001 0001 0002 002F 0013 120D 02F3 126C ----I-i-  CLRR R5           
   0001 0001 0002 002F 0013 0000 02F3 126D ---ZI-i-  SARC R1           
   0001 0000 0002 002F 0013 0000 02F3 126E -C-ZI---  BNC  $1292        
   0001 0000 0002 002F 0013 0000 02F3 1270 -C-ZI-i-  PSHR R5           
   0001 0000 0002 002F 0013 0000 02F4 1271 -C-ZI---  PSHR R4           
   0001 0000 0002 002F 0013 0000 02F5 1272 -C-ZI---  PSHR R1           
   0001 0000 0002 002F 0013 0000 02F6 1273 -C-ZI---  PSHR R0

So, despite a bit of fiddleing R4 is fundamentally initialised with the value in $31b, which will also have been corrupted by the stack. Also note that this fragment of code makes use of $319. Looking at how $31b is used, we see something very interesting:

   WR a=$031B d=031D CP-1610          (PC = $1204) t=4262127
   WR a=$031B d=0325 CP-1610          (PC = $1204) t=4263417
   WR a=$031B d=032D CP-1610          (PC = $1204) t=4263571
   WR a=$031B d=0335 CP-1610          (PC = $1204) t=4263725
   WR a=$031B d=033D CP-1610          (PC = $1204) t=4263987
   WR a=$031B d=0345 CP-1610          (PC = $1204) t=4264141
   WR a=$031B d=034D CP-1610          (PC = $1204) t=4264295
   WR a=$031B d=0355 CP-1610          (PC = $1204) t=4264449
   WR a=$031B d=031D CP-1610          (PC = $1204) t=4306929
   WR a=$031B d=0325 CP-1610          (PC = $1204) t=4308219
   WR a=$031B d=032D CP-1610          (PC = $1204) t=4308373
   WR a=$031B d=0335 CP-1610          (PC = $1204) t=4308527
   WR a=$031B d=033D CP-1610          (PC = $1204) t=4308789
   WR a=$031B d=0345 CP-1610          (PC = $1204) t=4308943
   WR a=$031B d=034D CP-1610          (PC = $1204) t=4309097
   WR a=$031B d=0355 CP-1610          (PC = $1204) t=4309251
   WR a=$031B d=031D CP-1610          (PC = $1204) t=4352119
  >>> Max stack increased to 0310 by code at 142b
  >>> Max stack increased to 0311 by code at 143f
  >>> Max stack increased to 0312 by code at 1004
  >>> Max stack increased to 0313 by code at 1006
  >>> Max stack increased to 0314 by code at 1007
  >>> Max stack increased to 0315 by code at 1008
  >>> Max stack increased to 0316 by code at 100a
  >>> Max stack increased to 0317 by code at 100b
  >>> Max stack increased to 0318 by code at 100c
  >>> Max stack increased to 0319 by code at 1126
  >>> Max stack increased to 031a by code at 1154
  >>> Max stack increased to 031b by code at 11d1
  >>> Max stack increased to 031c by code at 143f
   WR a=$031B d=142A CP-1610          (PC = $143F) t=4394630
   WR a=$031B d=0003 CP-1610          (PC = $142A) t=4394899
  >>> Max stack increased to 031d by code at 142b
  >>> Max stack increased to 031e by code at 143f
   WR a=$031B d=000B CP-1610          (PC = $1204) t=4443731
  CPU off in the weeds @ PC == 0402, w = ffff
  instruction count: 531521

The value in $31b appears to be a pointer to the base address of an 8 byte data structure used to hold sprite data. When operating correctly, it cycles between data for the 8 sprites held in addresses between $31d and $35c. However, once it gets corrupted with the value 0003 it does not take long for $31c to be corrupted and things to come to a screeching halt.

So we understand that the crash is caused by the stack corrupting the EXEC's memory space, in this case $31b. More generally, we should try to keep the value of R6 progressing above $319 as this is the lowest address used by the EXEC in WCB.

The next question is, why is the stack behaviour so variable, especially this early in the game? And is there any way to somehow reduce stack usage by the game?

Comments