While I beaver away on a more detailed description of Rick's debugger, here is a rough sketch of its code and architecture.
Overall the debugger is structured around an interactive UI event loop, starting at address $d575. This routine reads and interprets control inputs and then dispatches a handler based on any input provided. The handlers do things like change the currently active field, change the value of a field, single step instructions or run to breakpoints etc.
This interactive loop is reached by calling one of the ten entry points from the game under test. These subroutines will enter the interactive mode of the debugger under one of three different conditions - unconditionally, if both bottom action buttons are pressed, or if one of eight breakpoints are active. If the necessary condition is not met, control is returned to the game. On entry to interactive mode, the debugger caches the state of the CPU and the game screen into its workspace RAM. Then it draws the debugger screen detailing the current state of the CPU and falls into the UI loop.
Looking at the 13 commands that can be given to the debugger in a little more detail, they fall into 3 groups - commands to change which debugger field has focus, commands to change the value in the currently focused field and finally a set of commands to execute instructions.
The first set of commands to change which field has the focus are relatively simple. They increment or decrement the current field value held in $d2aa, checking for over or underflow before falling back into the main loop and forcing a screen redraw. The handler for showing the game screen while a disc is pressed is also part of this group. Effectively the game screen is another debugger field and pressing the disc gives it the focus. To do this the disc handler flips the BACKTAB with the version cached in workspace RAM and waits for the disk to be released, at which point it flips the screen back again, reshowing the debugger.
The next group of commands are a bit more complex. These change the value stored in the currently active field by incrementing, decrementing, setting the value to 0 or inserting a hex value into the field. Again, the field value is adjusted before returning to the main UI loop and redrawing the screen. There are a couple of things that are worthy of note here. Firstly, all fields are stored as 16 bit values in two 8-bit RAM locations, even single bit fields like the CPU flag values. Although this is wasteful, it simplifies the code used to interactive with fields. Secondly, when incrementing the value of the program counter in R7, the debugger partially decodes the current instruction and uses this to skip to the start of the next instruction, ignoring any operands, rather than always incrementing by one. This reduces the chances of a developer accidentally starting execution with R7 set mid-way through an instruction. Finally, only the address fields of the memory inspectors are actually stored. The data values are always read and written directly from memory. This introduces an interesting wrinkle, as care must be taken to correctly read or write to screen addresses. For things to work correctly either the game screen must be visible or the reads and writes must be redirected to the screen cache for addresses where the debugger is visible. The first approach is used, the game screen is re-shown, the read or write executed, and then the screen reverted to the debugger. This leads to the flicker seen when interacting with the debugger.
The final set of commands are the gnarly ones. These are the six handlers that deal with executing instructions.
The most fundamental of these commands will execute a single instruction and return to the main UI loop. Step-by-step execution of most instructions is achieved by grabbing copy of the instruction and writing it to an instruction cache starting at $d2ad in the workspace RAM. This cached instruction is then followed by a JMP to $d8d3. This allows the debugger to regain control of execution, and kicks off the caching of the updated state of the CPU. There are number of edge cases to this simple plan. The first is that, like the increment of R7, the instruction must be partially decoded to determine whether or not it has any arguments or an SDBD prefix, and if so, these must also be copied into the instruction cache. The second complexity are the instructions that explicitly use the program counter R7. Regardless of whether these use R7 as a source or destination they also cannot be executed directly, because either the value read will be incorrect, or again they will result in the loss of debugger control. Therefore, the debugger changes these instructions to use R5 as a surrogate for R7 and swaps the cached values in R7 with those in R5 (if R5 is also used by the instruction R4 is used as the surrogate instead). The final wrinkle is that some instructions like JMP and Branch cannot be executed in this way, as to do so would implicitly mean that the debugger loses control of execution. Therefore, these opcodes have to be emulated, updating the cached CPU state, rather than actually being executed. I'll leave the details of how this is done to the full document.
Having set up the instruction cache with the instruction to be executed, the cached register state is flushed to the CPU and the debugger jumps to the start of the instruction cache. It is important to note that, just like changing field values, before executing the instruction the game screen has to be reinstated by the debugger. This is necessary for reads or writes to screen BACKTAB RAM to obtain the correct value. The debugger also waits for the next VBLANK before running any instruction. This is required to ensure that reads and writes to STIC registers and GRAM are guaranteed to work. It is interesting that this interlock is not applied when reading or writing memory with the memory inspectors. Indeed, testing suggests that both reading and writing STIC register values is unreliable for this reason. The VBLANK constraint has the effect of limiting execution to a maximum of 60 instructions per second. On completion of the single instruction, control returns to the debugger at $d8d3, the updated CPU state is captured, the screen reverted to that of the debugger, if needed the values in R7 and R5 (or R4) are switched back and control returns to the UI loop.
The next, more complicated, user command is unconditional tracing of instructions. In this mode, instructions are executed one at a time, the state of the CPU is cached and the debugger screen redisplayed, but control does not fall into the interactive UI loop. Instead, a check is made if both lower action buttons are pressed, if so, the interactive UI is restarted, otherwise single step execution continues with the next instruction. Because the screen has to be redrawn with each instruction, this is a flickery old affair. All trace modes make use of the core single-step instruction execution described above, but use a debugger mode flag held in $d2b7 to decide how to proceed at the end of each instruction. Mode 0 - single stepping - returns to the main UI event loop and unconditional trace - Mode 1 - continues with its controller check at $d3ac.
Mode 2 - trace to the first non-EXEC instruction is similar to unconditional tracing. In addition to testing the bottom action buttons, it adds a check to see if the cached value of R7 is in the range $1000 - $1fff. If it isn't (indicating that execution is not taking place in the EXEC), tracing will stop and the interactive debugger restarted.
Mode 3 - trace to the end of the current subroutine is the most complex case. Prior to starting execution the current cached value of R6 is stored in $d2b5 and $d2b6. Like modes 1 and 2, the bottom action buttons are checked after each instruction is executed. In addition, if the new cached value of R6 is less than that recorded at the start of tracing, control returns to the UI event loop. Providing the stack is not directly tinkered with, this will occur when the current subroutine ends.
The final two commands - "unconditional run" and "run to breakpoint" do not make ongoing use of the single stepping core. Instead, they hand control back to the program. The only difference between these two run commands is whether a global "breakpoints active" flag, held in $d2ac is set or cleared prior to implementing the command. Both commands then make use of the single-step core to hand back control. This is done by writing a JMP command, with the currently cached value of R7 as its argument, to the instruction cache and then kicking off the single stepping core to execute it. The single-stepping code will then revert the screen to that of the game, flush the cached state to the CPU and execute the instruction cache releasing control to the game.
That is about it in terms of the high level architecture of the debugger (and a few details). I hope this overview illustrates that, whilst the debugger is conceptually relatively simple, there are a number of nasty edge cases that had to be considered. Further, some things that seem strange, such as the flickering screen and slow rate of tracing instructions are integral to its correct operation.
Comments
Post a Comment