If we look back to the point at which the stack stepped into the danger zone, we see that it was caused by the 60Hz EXEC VBLANK interrupt kicking in:
6645 84F0 0000 0002 1EEA 0002 0310 168F S---I--- XORR R1,R0 E2B5 84F0 0000 0002 1EEA 0002 0310 1690 S---I-iq RLC R0 ; Start of ISR E2B5 84F0 0000 0002 1EEA 0002 0311 1004 S---I-iQ PSHR R0 >>> Max stack increased to 0312 by code at 1004 E2B5 84F0 0000 0002 1EEA 0002 0312 1005 S---I--- GSWD R0 8080 84F0 0000 0002 1EEA 0002 0312 1006 S---I--- PSHR R0 >>> Max stack increased to 0313 by code at 1006 8080 84F0 0000 0002 1EEA 0002 0313 1007 S---I--- PSHR R1 >>> Max stack increased to 0314 by code at 1007 8080 84F0 0000 0002 1EEA 0002 0314 1008 S---I--- PSHR R2 >>> Max stack increased to 0315 by code at 1008 8080 84F0 0000 0002 1EEA 0002 0315 1009 S---I--- NOP 8080 84F0 0000 0002 1EEA 0002 0315 100A S---I-i- PSHR R3 >>> Max stack increased to 0316 by code at 100a 8080 84F0 0000 0002 1EEA 0002 0316 100B S---I--- PSHR R4 >>> Max stack increased to 0317 by code at 100b 8080 84F0 0000 0002 1EEA 0002 0317 100C S---I--- PSHR R5 >>> Max stack increased to 0318 by code at 100c 8080 84F0 0000 0002 1EEA 0002 0318 100D S---I--- MOVR R7,R5 8080 84F0 0000 0002 1EEA 100E 0318 100E ----I-i- ADDI #$0006,R5 8080 84F0 0000 0002 1EEA 1014 0318 1010 ----I-i- MVII #$0100,R4 8080 84F0 0000 0002 0100 1014 0318 1012 ----I-i- SDBD 8080 84F0 0000 0002 0100 1014 0318 1013 ----ID-- MVI@ R4,R7 8080 84F0 0000 0002 0102 1014 0318 52B2 ----I-i- CMPI #$0316,R6 8080 84F0 0000 0002 0102 1014 0318 52B4 -C--I-i- B $52EF 8080 84F0 0000 0002 0102 1014 0318 52EF -C--I-i- MVI $01D0,R0 0000 84F0 0000 0002 0102 1014 0318 52F1 -C--I-i- MVO R0,$0031 0000 84F0 0000 0002 0102 1014 0318 52F3 -C--I--- NEGR R0 0000 84F0 0000 0002 0102 1014 0318 52F4 -C-ZI-i- ADDI #$0008,R0 0008 84F0 0000 0002 0102 1014 0318 52F6 ----I-i- CMP $0119,R0 0008 84F0 0000 0002 0102 1014 0318 52F8 -C-ZI-i- BEQ $5309 0008 84F0 0000 0002 0102 1014 0318 5309 -C-ZI-i- J $1126 0008 84F0 0000 0002 0102 1014 0318 1126 -C-ZI-i- PSHR R5 >>> Max stack increased to 0319 by code at 1126 ; Welcome to the danger zone... 0008 84F0 0000 0002 0102 1014 0319 1127 -C-ZI--- MVI $0102,R1 0008 0001 0000 0002 0102 1014 0319 1129 -C-ZI-i- SWAP R1
The ISR firing in this way is entirely normal. So the question then is why is the stack up at $310 when the the EXEC kicks in?
If we now mark up the start (address $1004) and end (address $101c) of all VBLANK interrupt service routines, we see something very interesting. It looks as though the ISR routine that immediately precedes this one does not end! It gets kicked off with a stack value of $303, runs for a 60th of a second and then is interrupted again with R6 equalling $310. Further, apart from the initial stack check code (which would pass as the value is $30b at the time)...
0000 0000 06A1 0037 0100 1014 030B 1012 ----I-i- SDBD 0000 0000 06A1 0037 0100 1014 030B 1013 ----ID-- MVI@ R4,R7 ; Let's do the stack check... 0000 0000 06A1 0037 0102 1014 030B 52B2 ----I-i- CMPI #$0316,R6 ; Yup everything would be good, even if the following instruction ; did not ignore the test result 0000 0000 06A1 0037 0102 1014 030B 52B4 S---I-i- B $52EF 0000 0000 06A1 0037 0102 1014 030B 52EF S---I-i- MVI $01D0,R0 0000 0000 06A1 0037 0102 1014 030B 52F1 S---I-i- MVO R0,$0031 0000 0000 06A1 0037 0102 1014 030B 52F3 S---I--- NEGR R0 0000 0000 06A1 0037 0102 1014 030B 52F4 -C-ZI-i- ADDI #$0008,R0 0008 0000 06A1 0037 0102 1014 030B 52F6 ----I-i- CMP $0119,R0 0008 0000 06A1 0037 0102 1014 030B 52F8 -C-ZI-i- BEQ $5309 0008 0000 06A1 0037 0102 1014 030B 5309 -C-ZI-i- J $1126 0008 0000 06A1 0037 0102 1014 030B 1126 -C-ZI-i- PSHR R5 0008 0000 06A1 0037 0102 1014 030C 1127 -C-ZI--- MVI $0102,R1
...all this time is spent in the EXEC, rather than game code. This is weird, because it suggests that the data used by the EXEC is already corrupt in some way.
With a bit more hacking of JzIntv we can check for examples of one ISR handler crashing into another. It turns out this is very common in WCB. Not only do all games that crash or freeze do it, but so do all games that play through fine. In fact, because games that don't crash last longer, incidences of ISR crashing are typically higher in these games. There were an average of 29 ISR crashes in our original 100 games that played through to completion, verses 27 in the 100 games that crashed originally.
The obvious question is, do any other games exhibit such ISR crashes? Having checked a few titles, here are the results:
- Original Baseball - no ISR crashes seen
- Original Football - a small number of ISR crashes seen when initialising play
- Triple Action - one or two ISR crashes seen when initialising the sub games
- Utopia - no ISR crashes seen
- Tron Deadly Discs - no ISR crashes seen
- BurgerTime - no ISR crashes seen
- Masters of the Universe - a large number of ISR crashes seen when initialising each screen
- Super Pro Football - ISR crash city, significantly worse than WCB
- Super Pro Basketball - a small number of ISR crashes seen
- Super Pro Hockey - a similar number of ISR crashes seen to WCB
- Thunder Castle - no ISR crashes seen
(Apologies for the double-comment on the other thread. Apparently hitting "Back" is a bad idea?)
ReplyDeleteOK, so this confirms the thought I had that the crash is indeed due to bad interrupt nesting.
One hamfisted way to "fix" this is to have the ISR check for reentrancy and exit. Space Patrol has such a flag, although it doesn't check the flag until just before the "not safe to reenter" portion. It'll still update certain display elements, which avoids obvious hitches in the animations.
[code]
;;------------------------------------------------------------------;;
;; Remainder of engine is not safely reentrant. ;;
;;------------------------------------------------------------------;;
DIS ; 4 \
MVII #1, R1 ; 8 | !!Critical Section!!
CMP EBSY, R1 ; 10 | Check our "ENGINE IS BUSY" flag,
; |-- and if we're busy, just drop a
MVO R1, EBSY ; 11 | frame. :-( Not that I like that.
BNEQ E1.not_busy ; 9 /
;----
; 48
; 19
;====
; 67
ADD FDROP, R1 ; \ If we drop a frame, disable all
; |__ tasks for this frame by just
MVO R1, FDROP ; | returning Also, increment our
JE MISCISR.1 ; / frame-drop count. (debugging)
E1.not_busy:
[/code]
It's waaay to easy to run over on cycles when you only get a couple thousand instructions between interrupts, usually fewer.
On NTSC, you get 14934 cycles between interrupts, and lose about 10% of them to cycle stealing. In practice, the average cycles/instruction is between 8 and 9, meaning you get 1500 - 1600 instructions between retrace interrupts. If you manage to stick to 6 cycle instructions, you get a bit over 2200.
Ah, it rejected my pre tag, and it doesn't understand [code] either. So... enjoy ugly code?
DeleteOne last thought: WC Baseball makes many calls to RANDRC (aka. X_RAND2 in dis1600), which is the "random number in range". RANDRC gets a random number of bits from RANDY, and then does a range compare, discarding out-of-range values.
ReplyDeleteThe BSRs have already implicated this function in other crashes, because it can loop an arbitrarily large number of times in unlucky cases. And RANDY itself is pretty slow,, as I've documented elsewhere—something like 92 cycles/bit.
One call to RANDRC asked for a random number < $50. That's 7 bits. The odds of a 7-bit number being >= $50 is 48/128, or a bit over 1/3rd. The cost of generating 7 random bits and checking them is going to be close to 700 cycles.
There other calls, some of which whose thresholds aren't constants. I suspect RANDRC may be an ingredient in the crashes.