Tuesday, August 05, 2014

Part 2 of How to disassemble Sinclair ZX Spectrum Games

In part 1 of this series we covered the different data types and a summary of some legal implications.

In this part we cover creating a disassembly of the Z80 machine code. We will use a disassembler to create the assembler mnemonics - the human readable code version of the number that the Z80 processor interprets. We are making an assumption that most of the game is in Z80 machine code, rather than in BASIC or some other high level language. This is mostly true for commercial games but not all. We’ll look at the BASIC loader later.

Games not written Z80 machine code or Sinclair BASIC would need an interpreter written in Z80 machine code anyway, so this step is not lost.

If you want a game to disassemble (and haven’t chosen already) then Gulpman might be a reasonable choice because I’ve already done a full disassembly over here, which you can compare your progress with. You can get Gulpman from World of Spectrum but you’ll need to use an emulator to convert this info an SNA file to use with the tools in this article.


File Formats

Tape formats (tax and tap) are really no good because they don’t contain the memory contents. You could convert them to an memory image (or ’snap shot file) using a ZX Spectrum emulator.

Two example ‘snap shot’ formats are .sna http://www.worldofspectrum.org/faq/reference/formats.htm#File and .z80 http://www.worldofspectrum.org/faq/reference/z80format.htm. This are documented on the great world of Spectrum site. For a 48K Spectrum, both have a header followed by data. The only ‘problem’ with the .z80 format is that the data is compressed. We would need to ‘uncompress’ the data before we use it. If you want to do this, the format is well documented and you could write a simple program to uncompress the data.

A raw memory dump of the RAM area (from 0x4000 to 0xFFFF) would also be acceptable. (You could also do a raw memory dump of the ROM could be used from 0x0000 to 0x4000 - however, there are plenty of annotated disassemblies of the standard ZX Spectrum ROM on the Internet, including a copy of the book that was published in the 80’s).

Therefore I’ll assume you are using .sna format below. The utilities I’ll provide with this series of articles assume that format at the moment.


A Disassembler

You will need a disassembler to convert the Z80 machine code into assembler mnemonics - you can find several on the net.

The one I used is here http://robprobin.com/pmwiki.php?n=Main.ZXSpectrumTools - and is modified by me. (Please respect the wishes of the original author as per the top of the file).

This disassembler ASSUMES .sna format (it’s hard coded) when calculating the address of the instructions (and making the labels for the disassembly), however, it would be easy to change this assumption.

You will need to compile that with a C++ compiler. For example, at the terminal prompt / command line type (assuming you have GCC installed) you could tun the following commands:

For Windows:

    g++ Z80DIS_rob.cpp -o Z80DIS_rob.exe

For Linux or Mac:

    g++ Z80DIS_rob.cpp -o Z80DIS_rob


Once compiled, you can use it against the snapshot you are trying to disassemble:

On Windows:
    Z80DIS_rob game.sna > game_disassembly.txt

On Linux or Mac:
    ./Z80DIS_rob game.sna > game_disassembly.txt

Output

The file created (game_disassembly.txt in the above case), has both data and code disassembled into instructions. This has several disadvantages:

    1.    The header at the start will be disassembled. Since this is data not code, the instructions here will be not be valid.
    2.    Any data in the game will be disassembled. Again - instructions here will be rubbish.
    3.    Sometimes the instructions generated will ‘consume’ the first bytes of valid code.

Especially in the last case, some remedial work is required, manually editing the instructions that this point.

For example:

For a hex dump:

3E 3E 00

Would be disassembled as:

LD A, #3EH
NOP

But if we know the first byte is never executed and is in fact data, then the disassembly looks like:;

DB 3EH
LD A, #00H

Unfortunately, often, You only find this out as you start to understand how the code works. Have your Z80 instruction reference handy! (see last section for web references)


In the next parts we will look at an example disassembly in more detail and also at extracting text strings, hex for data and graphics.


Newer›  ‹Older