If you've spent your programming career in languages like Python, JavaScript, C#, or even C, stepping into Z80 assembly language can feel like travelling back to the bare metal origins of computing. Everything you take for granted—variables, expressions, function calls, loops—must be constructed manually from the most primitive operations imaginable.
The Z80 is an 8-bit microprocessor designed by Zilog in 1976. It powered countless home computers of the 1980s including the ZX Spectrum, Amstrad CPC, MSX machines, and the TRS-80. It also found its way into arcade machines, embedded systems, and the Nintendo Game Boy (as a modified version). Understanding Z80 assembly gives you direct insight into how computers actually work at the lowest level.
This guide assumes you can already program. We won't explain what a loop is conceptually—we'll show you how to build one from nothing but jumps and flags. Each section maps a familiar high-level concept to its Z80 equivalent, highlighting what changes and what stays the same.
Throughout this guide, all examples are written to be assembled using Pasmo with the --tapbas flag, targeting the ZX Spectrum. This means every example can be assembled into a .tap file with an auto-running BASIC loader, ready to load directly into an emulator such as Fuse.
Throughout this guide, examples use three simple routines to display values on screen. These are your equivalent of console.log() or print() — a way to see what your code is actually doing. When you add two numbers and want to check the result, or need to verify that a conditional branch went the right way, these routines let you print the answer to the screen.
The routines use the ZX Spectrum ROM, so they work immediately with Pasmo's --tapbas flag. Save the code below as debug_output.asm and include it in your programs with INCLUDE "debug_output.asm", or paste it directly at the end of your source files.
Don't worry if some of the assembly instructions below are unfamiliar — they are all explained in detail in the chapters that follow. For now, the important thing is understanding what each routine does and how to call it.
Before you write your first line of assembly, it helps to have an editor that understands Z80 syntax. Visual Studio Code is a free, cross-platform editor that works well for this guide. Download and install it from https://code.visualstudio.com/ for Windows, macOS, or Linux.
Once VS Code is installed, add the Z80 Assembly extension by Imanolea. It provides syntax highlighting for .asm files — coloured keywords, register names, directives, and comments — making your source code much easier to read and navigate.
To install it:
Alternatively, install it from the command line:
code --install-extension imanolea.z80-asm
With the extension active, .asm files will have coloured keywords, register names, and comments.
Every example in this guide is assembled using Pasmo with the --tapbas flag. This flag wraps your machine code in a TAP file with an auto-running BASIC loader, so the Spectrum loads and starts your program immediately when the tape is played. Without it you would have to type RANDOMIZE USR 32768 manually every time.
The general form of the command is:
pasmo --tapbas program.asm program.tap
For example, to assemble a file called hello.asm:
pasmo --tapbas hello.asm hello.tap
If Pasmo finds an error it will print the offending line number and a short message. Fix the error and re-run the command. When assembly succeeds with no output, your .tap file is ready.
Fuse is a cycle-accurate ZX Spectrum emulator. Once you have a .tap file, pass it directly to Fuse on the command line and it will auto-load the tape:
fuse --tape program.tap --auto-load
So for the example above:
fuse --tape hello.tap --auto-load
Fuse opens, plays the tape, the BASIC loader runs, and your program starts. The output from your debug routines appears on the emulated screen. Close the Fuse window when you are done, edit your source file, re-assemble with Pasmo, and run again.
Your typical edit–assemble–run cycle from the terminal therefore looks like this (on Linux / Mac systems):
pasmo --tapbas hello.asm hello.tap && fuse --tape hello.tap --auto-load
Chaining the commands with && means Fuse only launches if Pasmo succeeds — a failed assembly stops the chain before the emulator opens.
The ZX Spectrum can send output to different destinations — the main screen, the lower screen (where INPUT prompts appear), or a printer. The Spectrum calls these 'channels' and numbers them: channel 1 is the lower screen, channel 2 is the main screen, and channel 3 is the printer. Before you can print anything, you must tell the Spectrum which channel to use.
The ROM has a built-in routine at address 0x1601 that opens a channel. You tell it which one by putting the channel number in the A register before calling it. We want the main screen, so we put 2 in A.
debug_init:
LD A, 2 ; Put 2 in A (channel 2 = main screen)
CALL 0x1601 ; Call the ROM routine that opens a channel
RET ; Return to the caller
Call this once at the very start of your program. After that, the screen is ready and you can print to it freely.
In a high-level language, you'd write something like print('A') — you call a function and pass the character as an argument. In assembly, there are no function arguments in that sense. Instead, you put the value in a register before calling the routine. The routine then reads it from that register. The A register is used here — whatever value is in A when you call print_char is the character that gets printed.
The actual printing is done by RST 0x10, a ROM routine built into the Spectrum. RST is a special fast CALL to a fixed address. RST 0x10 prints whatever character is in the A register to whichever channel was last opened (which is why we call debug_init first).
print_char:
RST 0x10 ; Print whatever character is in A
RET ; Return to the caller
To use it, load the character into A first, then call:
LD A, '*' ; Put ASCII code for '*' into A
CALL print_char ; Prints: *
For a newline, load 13 (carriage return) into A and call RST 0x10 directly — there is no need for a separate routine when it is just two instructions:
LD A, 13 ; 13 = carriage return
RST 0x10 ; Move to the next line
A string in assembly is just a sequence of bytes in memory, one per character, with a zero byte at the end to mark where the string finishes. This zero byte is called the null terminator — it is how the routine knows when to stop printing. You define a string like this:
msg: DB "Hello!", 0 ; The 0 at the end is the null terminator
Let's break this line down. DB stands for "Define Byte" — it tells the assembler to place raw bytes directly into memory at this point rather than generating a machine instruction. The text in quotes, "Hello!", is a shorthand — the assembler converts each character to its ASCII code and stores them as consecutive bytes (72, 101, 108, 108, 111, 33). The comma separates items in the DB list, so you can mix quoted strings with individual byte values. Here, the 0 after the comma is a single byte with the value zero — the null terminator. It does not represent the character '0' (which would be ASCII 48); it is literally the value 0, and it marks the end of the string so our print routine knows when to stop.
The label msg is just a name for the memory address where the string starts. To print it, we need to tell the routine where to find the string. We do this by loading the address into the HL register pair. HL is a 16-bit register pair — H stands for High and L stands for Low, and together they hold a single 16-bit value. It is commonly used as a pointer — it holds the address of something in memory rather than the data itself.
print_string:
LD A, (HL) ; Read the byte at the address HL points to
OR A ; Is it zero? (sets the Zero flag if A is 0)
RET Z ; If zero, we've hit the null terminator — stop
RST 0x10 ; Not zero, so print this character
INC HL ; Move HL to point to the next byte
JR print_string ; Loop back and do the next character
Walking through it: LD A, (HL) reads one byte from the memory address that HL points to and puts it in A. The parentheses mean 'the contents of the address' — without them, LD A, HL would try to load the address itself, not the data at that address. OR A is a quick way to check if A is zero (it sets the Zero flag without changing A's value). RET Z returns if the Zero flag is set, meaning we've reached the null terminator. Otherwise, RST 0x10 prints the character, INC HL moves the pointer forward by one byte to the next character, and JR print_string jumps back to the top of the loop.
When we write LD HL, msg, it is important to understand what HL contains afterwards. HL does not hold the string itself — it cannot, because HL is only two bytes wide, and a string can be any length. What HL holds is the memory address where the first character of the string is stored. It is a pointer to the beginning of the string, nothing more.
The string itself lives in memory as a sequence of consecutive bytes laid out by the DB directive. Suppose msg ends up at address $8000. Memory would look like this:
; Address Value Character
; $8000 72 'H' <-- HL points here ($8000)
; $8001 101 'e'
; $8002 108 'l'
; $8003 108 'l'
; $8004 111 'o'
; $8005 33 '!'
; $8006 0 null terminator
After LD HL, msg, HL contains $8000. When we execute LD A, (HL), the CPU goes to address $8000 and reads the byte there — 72, the ASCII code for 'H'. Now here is why INC HL works: it adds 1 to the address in HL, changing it from $8000 to $8001. The next time around the loop, LD A, (HL) reads from $8001 and gets 101 — the 'e'. Each time through the loop, INC HL advances the pointer by one address, and because the characters are stored in consecutive bytes, this moves us to the next character in the string.
In a high-level language like Python, you might write s[i] and increment i — you keep a base address and an index, and the language computes the position for you. In assembly, we skip that and hold the position directly. HL is our position in the string. Incrementing it moves our position forward by one byte, which is exactly one character. There is no separate index variable, no base-plus-offset calculation — just a pointer that we walk forward through memory, one byte at a time, until we hit the null terminator.
To use it:
LD HL, msg ; Point HL at the start of the string
CALL print_string ; Prints: Hello!
This routine prints a number (0–255) as readable digits on screen. For example, if A contains 42, it prints the characters '4' and '2'. The Spectrum ROM has built-in routines that handle the conversion from a number to its decimal digits, so we don't need to do the maths ourselves.
The ROM's number-printing routine expects the number in the BC register pair (a 16-bit register, B is the high byte, C is the low byte). Since our number is 8-bit and sits in A, we need to put it into BC with B set to 0 (the high byte is zero because our number fits in a single byte) and C set to our number.
We also need to preserve the BC register. Other code in your program might be using B or C for something else, and if we overwrite them here, that code would break. PUSH BC saves BC's current value onto the stack (a temporary storage area) before we change it, and POP BC restores the original value when we're done.
print_number:
PUSH BC ; Save BC so we don't destroy the caller's values
LD B, 0 ; High byte = 0 (our number is only 8-bit)
LD C, A ; Low byte = the number we want to print
CALL 0x2D28 ; ROM routine: takes BC and prepares it for printing
CALL 0x2DE3 ; ROM routine: prints the number as decimal digits
POP BC ; Restore BC to its original value
RET ; Return to the caller
To use it, just load your number into A and call:
LD A, 42
CALL print_number ; Prints: 42
Before you can call the routines, you need to create the file they live in. Open VS Code, paste the code below into a new file, and save it as debug_output.asm in the same folder as your programs. Every example in this guide that uses INCLUDE "debug_output.asm" expects this file to be there.
; =============================================================
; debug_output.asm - Debug Display Routines for ZX Spectrum
; =============================================================
; INCLUDE this file in your program or copy the routines below.
;
; Call debug_init once at the start of your program to set up
; the screen channel. Then use:
;
; print_char - Print a single character. Input: A = char
; print_string - Print a null-terminated string. Input: HL
; print_number - Print an 8-bit number (0-255). Input: A
;
; For a newline, just do: LD A, 13 / RST 0x10
; =============================================================
; --- Call once at program start to open screen channel ---
debug_init:
LD A, 2 ; Channel 2 = main screen
CALL 0x1601 ; ROM: open channel
RET
; --- Print a single character in A ---
print_char:
RST 0x10 ; ROM: print character in A
RET
; --- Print null-terminated string at HL ---
print_string:
LD A, (HL)
OR A ; Check for null terminator
RET Z ; Return if end of string
RST 0x10 ; ROM: print character
INC HL
JR print_string
; --- Print 8-bit unsigned number in A (0-255) ---
print_number:
PUSH BC
LD B, 0
LD C, A ; BC = the number
CALL 0x2D28 ; ROM: STACK_BC - push onto calculator stack
CALL 0x2DE3 ; ROM: PRINT_FP - print the number
POP BC
RET
; =============================================================
; END OF DEBUG ROUTINES
; =============================================================
Call debug_init once at the start of your program. After that, use the three routines freely. Here is a complete program:
ORG 0x8000
CALL debug_init ; Set up screen output (do this once)
LD HL, msg ; Point HL at our string
CALL print_string ; Prints: Hello!
LD A, 13
RST 0x10 ; Newline
LD A, 42
CALL print_number ; Prints: 42
LD A, 13
RST 0x10 ; Newline
LD A, '*'
CALL print_char ; Prints: *
RET ; Return to BASIC
msg: DB "Hello!", 0
INCLUDE "debug_output.asm"
END 0x8000
These routines appear in examples throughout the guide. The full source is also reproduced in the appendix for easy reference.
You install a runtime or compiler, create a source file, and run a command. Python gives you an interpreter, Node gives you a runtime, C gives you a compiler that produces an executable. The toolchain handles the translation from your text to something the machine understands.
You need two things: Pasmo (the assembler) to translate your mnemonics into machine code, and an emulator to run the result. That's it.
Pasmo is a portable Z80 cross-assembler. When you use the --tapbas flag, it produces a .tap file that includes a BASIC loader program. This loader automatically runs your machine code when the tape is loaded, so you don't need to manually POKE addresses or type RANDOMIZE USR commands.
Emulator: For ZX Spectrum development, Fuse is excellent and freely available for all major operating systems. It loads .tap files directly.
The development cycle is simple. Write your assembly code in a .asm file. Assemble it with Pasmo. Load the resulting .tap file in Fuse. The below example is similar to the 'print_char' method we created above. Here is how that looks:
; Save this as first.asm
ORG 0x8000 ; Code starts at address 32768
LD A, 2 ; Channel 2 = main screen
CALL 0x1601 ; ROM: open channel
LD A, 42 ; ASCII '*'
RST 0x10 ; ROM: print character
RET ; Return to BASIC
END 0x8000 ; Entry point for BASIC loader
Assemble:
pasmo --tapbas first.asm first.tap
Then open first.tap in Fuse.
fuse --tap first.tap --auto-load
The BASIC loader runs automatically, executing your code. You should see an asterisk printed on screen.
The ORG 0x8000 directive tells the assembler where your code will live in memory (address 32768, safely above the Spectrum's BASIC area). The END 0x8000 directive tells Pasmo's BASIC loader what address to call with RANDOMIZE USR. RET at the end returns control back to BASIC.
Key Difference: There's no operating system in the traditional sense. The ZX Spectrum ROM provides some useful routines (like printing characters), but when your program ends with RET, control returns to the Spectrum's BASIC interpreter. If you use HALT instead, the processor stops and you must reset the machine.
You declare variables with names and types: let count = 0, int age = 25, name = "Alice". The language manages where these values are stored, handles type conversions, and lets you refer to them by name throughout your code.
There are no variables in the high-level sense. You have two places to store data: registers and memory. That's it.
The Z80 has a small set of registers—tiny, fast storage locations inside the processor itself:
A (Accumulator): The primary register for arithmetic and logic. Most operations use A as one operand and store results back in A.
B, C, D, E, H, L: General-purpose 8-bit registers. Can also be paired as BC, DE, and HL for 16-bit values.
HL: Often used as a pointer to memory. Many instructions work with the memory location that HL points to, written as (HL).
IX and IY: Index registers for accessing data at fixed offsets, like fields in a structure.
SP: Stack pointer—points to the current top of the stack.
PC: Program counter—the address of the next instruction. You don't manipulate this directly; jumps and calls change it.
F: Flags register—holds the results of comparisons and arithmetic (zero, carry, sign, etc.).
There are only two 'types': bytes (8-bit, 0–255 or –128 to 127 if signed) and words (16-bit, 0–65535). Everything else—strings, arrays, structures—you build yourself from these primitives.
; Defining data in memory
my_byte: DB 42 ; Define a single byte with value 42
my_word: DW 1000 ; Define a 16-bit word (stored little-endian)
my_string: DB "Hello", 0 ; String is just bytes, null-terminated
my_array: DB 1, 2, 3, 4, 5 ; Array is just consecutive bytes
DB means 'define byte', DW means 'define word'. The labels (my_byte, my_word, etc.) are just names for memory addresses—the assembler replaces them with actual numbers.
; Loading values
LD A, 10 ; Load immediate value 10 into A
LD B, A ; Copy A into B
LD A, (my_byte) ; Load value FROM memory address my_byte into A
LD HL, my_array ; Load the ADDRESS of my_array into HL
LD A, (HL) ; Load the byte that HL points to into A
; Storing values
LD (my_byte), A ; Store A's value TO memory address my_byte
LD (HL), 99 ; Store 99 at the address HL points to
Parentheses mean "the contents at this address". This is the single most important syntax rule in Z80 assembly. Without parentheses, you get the address itself; with parentheses, you get the value stored at that address. Compare these two instructions:
LD A, (my_byte) ; A = the value stored at my_byte's address (e.g. 42)
LD HL, my_byte ; HL = the address of my_byte itself (e.g. $8000)
In the first line, the parentheses tell the CPU: "go to the address labelled my_byte, read the byte stored there, and put it in A." In the second line, there are no parentheses, so the CPU simply loads the address number into HL — it never looks at what is stored there. This is the difference between reading a value from a postbox and writing down the postbox's location. Both are useful, but mixing them up is one of the most common beginner mistakes in Z80 programming.
To verify that your loads and stores are working, use the debug routines:
LD A, (my_byte) ; Load the value
CALL print_number ; Display it on screen - should show 42
LD A, 13
RST 0x10 ; Newline
Key Difference: There's no type checking, no automatic conversion, no protection. If you write a 16-bit value to an 8-bit location, you'll overwrite whatever's next in memory. You must track what's stored where.
Copy the program below into VS Code, save it as variables.asm in the same folder as debug_output.asm, then assemble and run it. It loads and displays each of the four data types defined above so you can see exactly what gets stored where.
For the array, we cannot simply load HL with the base address and walk forward with INC HL, because the ROM routines called inside print_number use HL internally and leave it pointing somewhere else — so any pointer you stored in HL is gone by the time the call returns. The fix is to preserve HL across each call using the stack. PUSH HL saves HL's current value onto the stack before the call, and POP HL restores it afterwards. Then INC HL advances the pointer to the next element as normal. This push-call-pop pattern is the standard Z80 technique for protecting a register across a subroutine call that might clobber it.
ORG 0x8000
CALL debug_init ; Open the screen channel (do this first)
; --- my_byte (a single 8-bit value: 42) ---
LD HL, label_byte
CALL print_string ; Print label
LD A, (my_byte) ; Load the byte from memory
CALL print_number ; Prints: 42
LD A, 13
RST 0x10 ; Newline
; --- my_string (null-terminated: "Hello") ---
LD HL, label_str
CALL print_string ; Print label
LD HL, my_string
CALL print_string ; Prints: Hello
LD A, 13
RST 0x10 ; Newline
; --- my_array (5 consecutive bytes: 1 2 3 4 5) ---
; PUSH HL before each print_number call to protect our pointer,
; POP HL after to restore it, then INC HL to advance to the next element.
LD HL, label_arr
CALL print_string ; Print label
LD HL, my_array ; Point HL at the first element
LD A, (HL)
PUSH HL ; Save HL — print_number will clobber it
CALL print_number ; Prints: 1
POP HL ; Restore HL so we know where we are
LD A, 32
RST 0x10 ; Space
INC HL ; Advance to element 2
LD A, (HL)
PUSH HL
CALL print_number ; Prints: 2
POP HL
LD A, 32
RST 0x10 ; Space
INC HL ; Advance to element 3
LD A, (HL)
PUSH HL
CALL print_number ; Prints: 3
POP HL
LD A, 32
RST 0x10 ; Space
INC HL ; Advance to element 4
LD A, (HL)
PUSH HL
CALL print_number ; Prints: 4
POP HL
LD A, 32
RST 0x10 ; Space
INC HL ; Advance to element 5
LD A, (HL)
CALL print_number ; Prints: 5 (no PUSH/POP needed — we're done with HL)
LD A, 13
RST 0x10 ; Newline
RET ; Return to BASIC
; --- Data ---
my_byte: DB 42
my_string: DB "Hello", 0
my_array: DB 1, 2, 3, 4, 5
; --- Output labels ---
label_byte: DB "my_byte: ", 0
label_str: DB "my_string: ", 0
label_arr: DB "my_array: ", 0
INCLUDE "debug_output.asm"
END 0x8000
Expected output on screen:
my_byte: 42
my_string: Hello
my_array: 1 2 3 4 5
Expressions combine values with operators: result = (a + b) * c - d / 2. The compiler figures out the order of operations, allocates temporary storage, and generates all necessary instructions automatically.
There are no expressions. Each operation is a separate instruction, and results almost always go into the accumulator (A). To compute (a + b) * c, you must break it down step by step.
; Addition
ADD A, B ; A = A + B
ADD A, 5 ; A = A + 5
; Subtraction
SUB B ; A = A - B
SUB 10 ; A = A - 10
; Increment / Decrement
INC A ; A = A + 1
DEC B ; B = B - 1
INC HL ; HL = HL + 1 (16-bit increment)
These instructions work on individual bits. Each bit in A is compared with the corresponding bit in the other register, one bit at a time. The examples below use the % prefix to write numbers in binary (e.g. %00000001 is the value 1, with each digit representing one bit). This makes it easy to see exactly which bits are being affected:
Bitwise vs logical — an important distinction. In a high-level language, AND and OR are usually logical operators that treat entire values as either true or false: 3 AND 5 evaluates to true because both values are non-zero. On the Z80, AND and OR are bitwise — they operate on each of the 8 bits independently. This means the result depends on the bit patterns, not the overall values. For example, if A = 3 and B = 5:
; A = 3 → 00000011
; B = 5 → 00000101
; AND B result → 00000001 (only bit 0 is 1 in both — result is 1, not 3 or 5)
; OR B result → 00000111 (bits 0, 1, and 2 are set in at least one — result is 7)
If you come from Python or C and expect AND to mean "are both of these things true?", you will get unexpected results. The Z80 does not care whether a value is "truthy" — it lines up the bits and compares them one by one. There is no logical AND or OR on the Z80. You get bitwise operations only, and they are powerful once you think in terms of individual bits rather than whole numbers.
AND B ; A = A AND B
OR C ; A = A OR C
XOR A ; A = A XOR A (quick way to set A to 0)
CPL ; A = NOT A (flip every bit)
AND — a bit in the result is 1 only if that bit is 1 in both inputs. If either bit is 0, the result is 0. This is useful for masking out bits you don't care about. For example, AND with 00001111 keeps only the lower four bits and clears the upper four.
; Reading the keyboard on the Spectrum returns a byte where
; each bit represents a key (0 = pressed, 1 = not pressed).
; To check just bit 0 (the outermost key), mask off everything else:
IN A, (0xFE) ; Read keyboard port
AND %00000001 ; Keep only bit 0, all other bits become 0
; A is now either 0 (key pressed) or 1 (key not pressed)
OR — a bit in the result is 1 if that bit is 1 in either input (or both). The only way a bit ends up 0 is if it is 0 in both inputs. This is useful for setting specific bits without disturbing the others.
; The Spectrum screen attributes byte controls ink, paper, bright and flash.
; To turn on BRIGHT (bit 6) without changing the colours:
LD A, (HL) ; Read current attribute byte
OR %01000000 ; Force bit 6 to 1 — BRIGHT on
LD (HL), A ; Write it back — colours unchanged, now bright
XOR — a bit in the result is 1 if the two inputs are different, and 0 if they are the same. This is why XOR A with itself always gives 0 — every bit matches, so every result bit is 0. It is the fastest way to clear A on the Z80 (one byte, four T-states, versus two bytes for LD A, 0). XOR is also useful for toggling bits on and off:
; Toggle FLASH (bit 7) on a screen attribute — if it was on, turn it off;
; if it was off, turn it on:
LD A, (HL) ; Read current attribute byte
XOR %10000000 ; Flip bit 7 — toggles FLASH
LD (HL), A ; Write it back
CPL — flips every bit in A. Every 1 becomes 0 and every 0 becomes 1. For example, if A is 00110101, after CPL it becomes 11001010. One common use is inverting a sprite so it shows up against any background:
; To draw a sprite using XOR (so it can be erased by drawing again),
; you sometimes need the inverted version of the graphic data:
LD A, (HL) ; Read a byte of sprite data
CPL ; Invert it — every pixel flips
; Now A holds the negative image of that sprite row
Let's compute result = (10 + 5) * 2 and verify the answer:
LD A, 10 ; A = 10
LD B, 5 ; B = 5
ADD A, B ; A = 10 + 5 = 15
ADD A, A ; A = 15 * 2 = 30 (adding to itself)
; Verify the result using debug output
CALL print_number ; Should display 30
Notice there's no multiply instruction for simple multiplication. For multiplying by 2, you add the value to itself or use shift instructions. For general multiplication, you must implement it with loops or lookup tables.
Key Difference: The Z80 has no MUL or DIV instructions. Multiplication and division must be implemented in software, typically through repeated addition/subtraction or clever use of shifts. This is why early games often used powers of 2 for sizes and speeds.
Copy the program below into VS Code, save it as bitwise.asm alongside debug_output.asm, then assemble and run it. Rather than just printing numbers, this example uses AND and OR to directly manipulate the ZX Spectrum's attribute memory — producing a visible effect on screen.
Every character cell on the Spectrum screen has a corresponding attribute byte in memory, starting at address 0x5800. The byte packs four pieces of information into a single 8-bit value:
; Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
; Flash Bright Paper Paper Paper Ink Ink Ink
;
; Flash=1 — the cell alternates between ink and paper colours at 50Hz
; Bright=1 — makes both colours more vivid
; Paper (bits 5:3) — background colour 0-7
; Ink (bits 2:0) — character colour 0-7
This is exactly the kind of problem AND and OR are designed for. OR sets specific bits to 1 without touching anything else — perfect for switching a flag on. AND clears specific bits to 0 without touching anything else — perfect for switching a flag off. The colour bits are never disturbed because only the flag bits are included in the mask.
CALL 0x0DAF is the Spectrum ROM's CLS routine. It clears the screen and moves the cursor to position (0, 0), so we know our text will land at attribute address 0x5800. Without it, text would appear wherever the BASIC loader left the cursor, and we would not know which attribute cells to modify.
ORG 0x8000
CALL 0x0DAF ; ROM: CLS — clear screen, cursor to (0,0)
CALL debug_init ; Open screen channel
LD HL, msg
CALL print_string ; "FLASH!" appears at the top of the screen
; Use OR to set FLASH (bit 7) and BRIGHT (bit 6) on each of the 6 characters.
; OR %11000000 forces both bits to 1 without touching the colour bits.
LD A, (0x5800)
OR %11000000 ; Set Flash + Bright on 'F'
LD (0x5800), A
LD A, (0x5801)
OR %11000000 ; Set Flash + Bright on 'L'
LD (0x5801), A
LD A, (0x5802)
OR %11000000 ; Set Flash + Bright on 'A'
LD (0x5802), A
LD A, (0x5803)
OR %11000000 ; Set Flash + Bright on 'S'
LD (0x5803), A
LD A, (0x5804)
OR %11000000 ; Set Flash + Bright on 'H'
LD (0x5804), A
LD A, (0x5805)
OR %11000000 ; Set Flash + Bright on '!'
LD (0x5805), A
; Now use AND to clear just the FLASH bit (bit 7) on the first character.
; AND %01111111 forces bit 7 to 0 while leaving everything else unchanged.
; 'F' stops flashing but stays bright — the rest of the word keeps flashing.
LD A, (0x5800)
AND %01111111 ; Clear Flash on 'F' only
LD (0x5800), A
RET ; Return to BASIC
msg: DB "FLASH!", 0
INCLUDE "debug_output.asm"
END 0x8000
You should see FLASH! appear at the top of the screen. The letters LASH! will flash between white-on-black and black-on-white at 50 times per second. The leading F will stay steady — it has Bright set (via OR) but Flash cleared (via AND). That difference in behaviour is caused by a single bit in the attribute byte.
You write if/else blocks: if (x > 10) { doThis(); } else { doThat(); }. The language handles the branching logic invisibly.
Conditionals work through two mechanisms: the flags register and conditional jumps. Every arithmetic or comparison instruction sets flags. You then use those flags to decide whether to jump.
The F register contains several flag bits that reflect the result of the last operation:
Z (Zero): Set if the result was zero.
C (Carry): Set when a result does not fit in 8 bits. For addition, the carry flag is set if the result exceeds 255 — for example, 200 + 100 = 300, which is too large for a single byte, so the carry flag is set and A wraps around to 44 (300 - 256). For subtraction, the carry flag is set if the result would go below 0 — for example, 10 - 20 = -10, which is not a valid unsigned number, so the carry flag is set and A wraps around to 246 (256 - 10). Think of it as the CPU's way of saying "the answer didn't fit" — you can then check the carry flag to detect the overflow, or use ADC and SBC (explained below) to carry the extra bit into the next byte of a multi-byte calculation.
The carry flag is also used by two special arithmetic instructions: ADC (Add with Carry) and SBC (Subtract with Carry). These add or subtract as normal, but also include the carry flag in the calculation. This is how you do maths on numbers larger than 255 — the carry flag acts as the "1" you carry over when adding columns of digits by hand.
Suppose you want to add two 16-bit numbers stored as pairs of bytes (low byte first, as the Z80 stores them). A single ADD can only handle 8 bits, so you add the low bytes first, then add the high bytes with carry to pick up any overflow from the first addition:
; Adding 200 + 100 = 300
; These fit in single bytes, but their sum does not
LD A, 200 ; A = 200
ADD A, 100 ; 200 + 100 = 300, but 300 > 255!
; A wraps to 44 (300 - 256), carry flag is SET
LD C, A ; C = 44 (low byte)
LD A, 0 ; High byte was 0
ADC A, 0 ; 0 + 0 + carry(1) = 1
LD B, A ; Result: B = 1, C = 44 → 256 + 44 = 300
The carry flag did the crucial work here. The low byte addition (200 + 100) overflowed past 255, so A wrapped around to 44 and the carry flag was set. When we then did ADC A, 0 on the high byte, it added 0 + 0 + 1 (the carry) = 1. The final result is split across B and C: B holds 1 (representing 256) and C holds 44, giving 256 + 44 = 300. This is exactly like carrying a digit in long addition.
Here is another example where the low bytes do not overflow, so the carry stays clear:
; Adding 50 + 30 = 80
; Both fit in a byte, and so does the result
LD A, 50 ; A = 50
ADD A, 30 ; 50 + 30 = 80, no overflow, carry flag is CLEAR
LD C, A ; C = 80 (low byte)
LD A, 0 ; High byte was 0
ADC A, 0 ; 0 + 0 + carry(0) = 0
LD B, A ; Result: B = 0, C = 80 → just 80
This time the low bytes added up to 80, which fits in a single byte, so the carry flag stayed clear. ADC added 0 + 0 + 0 = 0, and the high byte remained zero. The same code handles both cases correctly — the carry flag automatically propagates the overflow when it happens and does nothing when it doesn't.
SBC works the same way but for subtraction — it subtracts the value and the carry flag. This handles the "borrow" when the low byte subtraction needs to take 1 from the high byte.
S (Sign): Set if the result was negative (bit 7 is 1).
P/V (Parity/Overflow): Context-dependent—parity for logic ops, overflow for arithmetic.
Jump instructions send execution to a label — a name you give to a specific point in your code, written as a word followed by a colon (e.g. my_label:). The assembler converts the label into a memory address for you. Labels are covered in more detail in the Functions section.
JP Z, label ; Jump if zero flag is set
JP NZ, label ; Jump if zero flag is NOT set
JP C, label ; Jump if carry flag is set
JP NC, label ; Jump if carry flag is NOT set
JP M, label ; Jump if sign flag is set (negative)
JP P, label ; Jump if sign flag is clear (positive)
There is also JR (Jump Relative), which works the same way but encodes the destination as an offset from the current position rather than an absolute address. JR is one byte smaller than JP, but it can only jump within -128 to +127 bytes of the current instruction, and it only supports four conditions: Z, NZ, C, and NC. Use JR for short, nearby branches and JP when the target is far away or you need the S or P/V flags.
CP B ; Compare A with B (sets flags, doesn't change A)
CP 100 ; Compare A with 100
The CP instruction subtracts without storing the result — it only sets the flags. A is unchanged afterwards. You then use a conditional jump to act on the result:
CP 10 ; Subtract 10 from A (but don't store the result)
JR Z, is_ten ; Jump if A was equal to 10 (Zero flag set)
JR C, less_than ; Jump if A was less than 10 (Carry flag set)
; If we reach here, A was greater than 10
Think of it as asking a question: CP asks "how does A compare?", and the conditional jump acts on the answer. The common patterns are:
CP n then JR Z — jump if A equals nCP n then JR NZ — jump if A does not equal nCP n then JR C — jump if A is less than nCP n then JR NC — jump if A is greater than or equal to nHere's how to implement if (a == 10) { b = 1 } else { b = 0 }, with debug output to verify:
ORG 0x8000
CALL debug_init
LD A, (var_a) ; Load a
CP 10 ; Compare with 10 (sets Z if equal)
JP NZ, else_block ; If not equal, jump to else
; If block (a == 10)
LD A, 1
LD (var_b), A
LD HL, msg_equal
CALL print_string ; Print confirmation
JP end_if ; Skip the else block
else_block:
; Else block (a != 10)
LD A, 0
LD (var_b), A
LD HL, msg_noteq
CALL print_string ; Print confirmation
end_if:
LD A, 13
RST 0x10 ; Newline
RET
msg_equal: DB "a equals 10", 0
msg_noteq: DB "a is not 10", 0
var_a: DB 10
var_b: DB 0
INCLUDE "debug_output.asm"
END 0x8000
For greater-than or less-than comparisons, you use CP and check the carry and sign flags. After CP B, the carry flag is set if A < B (unsigned), and you can combine checks for signed comparisons.
Key Difference: There's no implicit 'else' structure. You must explicitly jump over blocks you don't want to execute. Every branch is a goto by another name. The structure exists only in how you lay out your labels and jumps.
Loops have dedicated syntax: for, while, do-while, foreach. The language manages the iteration variable, the condition check, and the repetition.
Loops are built from labels, conditional jumps, and decrement instructions. The Z80 does have one special instruction that helps: DJNZ.
DJNZ stands for ';Decrement B and Jump if Not Zero'. It's a built-in counted loop primitive:
LD B, 10 ; Loop 10 times
loop_start:
; ... loop body ...
DJNZ loop_start ; Decrement B, jump if B != 0
This is the assembly equivalent of for (int i = 10; i > 0; i--). Note that it counts down, not up, and always uses the B register.
A while loop tests a condition at the top and repeats until that condition is false. Here is the general pattern — it loads the variable from memory on every iteration, checks it, does the work, stores it back, and loops:
; while (a != 0) { a--; }
while_start:
LD A, (var_a) ; Load the current value from memory into A
CP 0 ; Is it 0?
JP Z, while_end ; If yes, exit the loop
DEC A ; Subtract 1
LD (var_a), A ; Store the new value back to memory
JP while_start ; Jump back to the top and check again
while_end:
var_a: DB 5 ; Starting value — the loop will run 5 times
Notice that A is loaded from memory every time around the loop. Why? Because the Z80 cannot compare or decrement a value directly in memory — it can only work on registers. So the pattern is always: load from memory, work on it in a register, store it back.
In this simple case, though, that reload is wasteful. After DEC A, the register already holds the updated value — there is no need to store it and immediately load it again. A more efficient version keeps everything in A and only touches memory at the start and end:
; Same result, but faster — no unnecessary memory access inside the loop
LD A, (var_a) ; Load once before the loop
while_start:
CP 0 ; Is A 0?
JP Z, while_end ; If yes, done
DEC A ; A still holds the value — no need to reload
JP while_start ; Loop back (A carries the value with it)
while_end:
LD (var_a), A ; Store the final result (0) back to memory
The first version is the safe, general pattern — you would need it if the loop body used A for other things (like printing or calling a subroutine) and its value got overwritten. The second version works when nothing else in the loop touches A, so the value survives from one iteration to the next.
Here is how to sum all bytes in an array and display the result. The simplest approach is to hardcode the length:
LD HL, my_array ; Point HL at start of array
LD B, 5 ; Array length — hardcoded
XOR A ; Clear A (running total)
sum_loop:
ADD A, (HL) ; Add current byte to total
INC HL ; Move to next byte
DJNZ sum_loop ; Repeat B times
CALL print_number ; Display the sum on screen
LD A, 13
RST 0x10 ; Newline
my_array: DB 10, 20, 30, 40, 50
This works, but the length is baked into the code. If you add or remove elements from the array, you must remember to update the LD B, 5 line as well — a common source of bugs.
A better approach is to make the array carry its own length as the first byte. The data follows immediately after:
my_array: DB 5, 10, 20, 30, 40, 50
; ^ ^--- the actual data starts here
; |
; length byte (5 elements follow)
Now the loop reads the length at runtime instead of having it hardcoded:
LD HL, my_array ; HL points to the length byte
LD B, (HL) ; B = first byte = 5 (the length)
INC HL ; Move HL past the length byte to the actual data
XOR A ; Clear A (running total)
sum_loop:
ADD A, (HL) ; Add current byte to total
INC HL ; Move to next byte
DJNZ sum_loop ; Repeat B times
CALL print_number ; Display the sum on screen
LD A, 13
RST 0x10 ; Newline
With this approach, you can change the array data freely — just keep the length byte accurate. The Z80 has no built-in way to know how long an array is, so you must always track the length yourself, either by hardcoding it, storing it alongside the data like this, or using a terminator value (like the null terminator we use for strings).
In a high-level language, adding to an array is simple — my_array.push(60) or my_array.append(60). The language handles resizing the array if needed. In assembly, there is no resizing. Memory is fixed. If you want to add elements at runtime, you must reserve extra space upfront:
my_array: DB 3, 10, 20, 30 ; Length byte (3), followed by 3 values
DS 5 ; Reserve 5 extra bytes of room to grow
The array currently holds 3 elements but has room for up to 8. To add the value 60 to the end, we need to: read the current length, calculate where the new element goes, write it there, and update the length byte:
; Add 60 to the end of my_array
LD HL, my_array ; HL points to the length byte
LD A, (HL) ; A = current length (3)
INC A ; A = new length (4)
LD (HL), A ; Update the length byte in memory
LD D, 0
LD E, A ; DE = new length (4)
ADD HL, DE ; HL = my_array + 4 = address of the new last element
LD (HL), 60 ; Write 60 there
After this, the array in memory is: 4, 10, 20, 30, 60 — the length byte has been updated to 4 and the new value sits at the end.
There is no bounds checking. If you add more elements than you reserved space for, you will silently overwrite whatever comes next in memory — other variables, other code, anything. There is no error, no warning. It is entirely your responsibility to check the length before adding and make sure you do not exceed the reserved space.
Key Difference: Infinite loops are trivially easy—just JP back to a label with no exit condition. There's no for-each; you manually track the pointer and count. Off-by-one errors are common and unforgiving.
Copy the program below into VS Code, save it as loops.asm, then assemble and run it. It cycles the screen border through all eight colours, holding each one for half a second so you can see it change. It uses both loop patterns from this section at the same time: a DJNZ inner loop for the delay, and a while-style outer loop to step through the colours.
The border colour is controlled by hardware port 0xFE. Writing a value to it with OUT (0xFE), A sets the border immediately — bits 2:0 of A select the colour (0=black, 1=blue, 2=red, 3=magenta, 4=green, 5=cyan, 6=yellow, 7=white). You will see this instruction again in the Hardware chapter; for now, just treat it as "set border to the colour number in A".
HALT suspends the CPU until the next hardware interrupt. The Spectrum fires an interrupt 50 times per second, so each HALT waits roughly 20ms. Looping 25 times gives a half-second pause between colour changes — long enough to see each one clearly.
ORG 0x8000
LD C, 0 ; C = current border colour, starting at 0
colour_loop:
LD A, C
OUT (0xFE), A ; Set the border to colour C
; Inner DJNZ loop: 25 HALTs x 20ms each = ~0.5 seconds
LD B, 25
delay_loop:
HALT ; Wait for the next interrupt (~20ms)
DJNZ delay_loop ; Repeat 25 times
; Advance to the next colour and loop while C < 8
INC C
LD A, C
CP 8 ; Have we shown all 8 colours (0-7)?
JP NZ, colour_loop ; No — go back for the next one
RET ; Yes — all colours shown, return to BASIC
END 0x8000
The border should step through black, blue, red, magenta, green, cyan, yellow, and white, spending about half a second on each. Notice how the two loops nest: the outer colour_loop runs 8 times driven by the CP 8 / JP NZ condition, and each pass through it runs the inner delay_loop 25 times via DJNZ before moving to the next colour.
Functions have names, parameters, return values, local variables, and scope. The language manages the call stack, passes arguments, and returns values according to well-defined conventions.
Functions are called 'subroutines'. CALL pushes the return address onto the stack and jumps. RET pops that address and returns. Everything else—parameters, return values, local storage—is your responsibility.
; Main code
LD A, 5 ; Set up parameter in A
CALL double_it ; Call subroutine
CALL print_number ; Display result - should show 10
LD A, 13
RST 0x10 ; Newline
; Subroutine: doubles the value in A
double_it:
ADD A, A ; A = A * 2
RET ; Return to caller
The stack is a region of memory that grows downward. SP (stack pointer) points to the top. PUSH adds values, POP removes them:
PUSH BC ; Save BC on stack (SP decreases by 2)
PUSH DE ; Save DE
; ... do work that modifies BC and DE ...
POP DE ; Restore DE (LIFO order!)
POP BC ; Restore BC
This is how you preserve registers across subroutine calls. If your subroutine uses BC but the caller needs it preserved, push it at the start and pop it before returning.
There's no enforced convention. Common approaches include:
Registers: Put parameters in specific registers before CALL. Fast and simple for few parameters.
Stack: PUSH parameters before CALL, pop them in the subroutine. More flexible but slower.
Fixed memory: Store parameters at known addresses. Simple but not re-entrant.
; Example: pass two parameters via registers
; add_numbers: adds B and C, returns result in A
LD B, 10
LD C, 20
CALL add_numbers
CALL print_number ; Display result - should show 30
LD A, 13
RST 0x10 ; Newline
add_numbers:
LD A, B
ADD A, C
RET
Key Difference: There's no scope, no automatic cleanup, no type safety. If a subroutine corrupts a register the caller was using, the bug may manifest far from its cause. Documentation and discipline replace compiler enforcement.
Arrays, lists, dictionaries, objects, tuples—rich data structures with automatic bounds checking, dynamic resizing, and convenient access syntax.
Everything is bytes in memory. Arrays are contiguous bytes. Structures are memory layouts you document and maintain manually. Pointers are just 16-bit values in register pairs.
An array is just a block of consecutive bytes in memory. You can define one using DB with a list of values, as we have already seen. But if you want to reserve space without setting initial values — for example, a buffer to fill in later — you use DS (Define Space). DS tells the assembler to set aside a number of bytes, all initialized to zero. It does not create a new data type; it simply reserves empty room in memory.
; Define an array of 10 bytes, initialized to zero
scores: DS 10 ; DS = Define Space (reserves 10 bytes, all zero)
; Or define an array with initial values
primes: DB 2, 3, 5, 7, 11, 13, 17, 19, 23, 29
The primes array is laid out in memory as 10 consecutive bytes. If primes starts at address 40000, it looks like this:
; Address Index Value
; 40000 [0] 2
; 40001 [1] 3
; 40002 [2] 5
; 40003 [3] 7
; 40004 [4] 11
; ...and so on
To access a specific element, we need to calculate its address: the start of the array plus the index. In a high-level language you would write primes[3] to get the fourth element (7). In Z80 assembly, we do the same calculation manually — but there is a catch. The Z80 has no instruction to add an 8-bit register to HL. ADD HL, DE exists, but ADD HL, B does not. So we must put our index into a 16-bit register pair (DE) first, with D set to 0 since our index is small enough to fit in the low byte alone:
; Access element 3 of the primes array (0-based, so this is the fourth value)
LD B, 3 ; B = the index we want
LD HL, primes ; HL = address of the first element (e.g. 40000)
LD D, 0 ; D = 0 (high byte of index, zero because index < 256)
LD E, B ; E = 3 (low byte of index — now DE = 3)
ADD HL, DE ; HL = 40000 + 3 = 40003
LD A, (HL) ; Read the byte at address 40003 — A = 7
In a high-level language, you might group related data together using a class or struct:
// High-level language equivalent
// class Player {
// x = 0 (1 byte)
// y = 0 (1 byte)
// health = 0 (1 byte)
// score = 0 (2 bytes)
// lives = 0 (1 byte)
// }
The Z80 has no concept of classes, structs, or objects. But we can achieve the same thing by reserving a block of bytes and deciding which byte means what. A "structure" in assembly is just a convention — a documented agreement that "byte 0 is the x position, byte 1 is the y position" and so on. The CPU does not enforce this; it is entirely up to you to read and write the correct bytes.
First, we define the layout using EQU (short for "equals"). EQU gives a name to a number — it does not reserve any memory or generate any code. It simply tells the assembler "wherever you see PLAYER_X, substitute 0" and so on. This makes the code readable instead of being full of mystery numbers:
; Player structure layout: 6 bytes total
PLAYER_X EQU 0 ; Offset 0: x position (1 byte)
PLAYER_Y EQU 1 ; Offset 1: y position (1 byte)
PLAYER_HP EQU 2 ; Offset 2: health (1 byte)
PLAYER_SCORE EQU 3 ; Offset 3: score (2 bytes)
PLAYER_LIVES EQU 5 ; Offset 5: lives (1 byte)
player1: DS 6 ; Reserve 6 bytes of memory for one player
If player1 starts at address 40000, the memory looks like this:
; Address Offset Field
; 40000 +0 x position
; 40001 +1 y position
; 40002 +2 health
; 40003 +3 score (low byte)
; 40004 +4 score (high byte)
; 40005 +5 lives
To access these fields, we use the IX register. IX is a special 16-bit register that supports something the other registers do not: an offset using the + sign. When you write (IX+2), it means "the byte at the address in IX, plus 2." The CPU calculates the address for you — if IX holds 40000, then (IX+2) reads from address 40002. This is perfect for structures because IX points to the start of the structure and the offset jumps to the field you want:
; Point IX at our player data
LD IX, player1 ; IX = address of player1 (e.g. 40000)
; Read a field — equivalent to: hp = player1.health
LD A, (IX+PLAYER_HP) ; PLAYER_HP is 2, so this reads from address 40002
; Modify it — equivalent to: player1.health = player1.health - 1
DEC A ; Subtract 1 from health
LD (IX+PLAYER_HP), A ; Write the new value back to address 40002
; Write a field — equivalent to: player1.x = 100
LD (IX+PLAYER_X), 100 ; PLAYER_X is 0, so this writes to address 40000
Without EQU, that last line would be LD (IX+0), 100 — correct but meaningless to anyone reading the code. The named constants make it clear that we are setting the x position, not just writing to "offset 0." This is as close to player1.x = 100 as assembly gets.
Strings are just arrays of bytes with a terminator. Since we have the debug routines, printing them is straightforward:
; Null-terminated string
greeting: DB "Hello, World!", 0
; Print using our debug routine
LD HL, greeting
CALL print_string ; Displays: Hello, World!
LD A, 13
RST 0x10 ; Newline
Key Difference: No bounds checking whatsoever. Read past the end of an array and you get whatever garbage is in memory. Write past the end and you corrupt other data or code. Buffer overflows are not just possible—they're easy.
print(), console.log(), scanf(), file streams—abstract interfaces that hide hardware details. The OS and runtime handle the actual communication with devices.
The ZX Spectrum provides two main ways to interact with hardware: the ROM routines (which we use in our debug output library) and direct hardware access via I/O ports and memory-mapped I/O.
The debug routines in this guide use the Spectrum ROM for printing. The key ROM entry points are:
; Open a channel (2 = main screen, 1 = lower screen)
LD A, 2
CALL 0x1601
; Print a single character
LD A, 65 ; 'A'
RST 0x10
; Clear the screen
CALL 0x0DAF
On the ZX Spectrum, the screen memory starts at address 16384 (0x4000). Writing directly to this memory changes what appears on screen:
; Fill ZX Spectrum screen with a pattern
LD HL, 16384 ; Screen memory start
LD DE, 16385
LD BC, 6144 ; Screen size in bytes
LD (HL), 255 ; Fill pattern
LDIR ; Block copy (fills screen)
; Read keyboard half-row (ZX Spectrum)
; Port 0xFE, high byte of address selects row
LD A, 0xFD ; Row with A, S, D, F, G
IN A, (0xFE) ; Read key states
BIT 0, A ; Test bit 0 (A key)
JP Z, a_pressed ; Bit is 0 when key pressed
Key Difference: There's no abstraction. You must know the hardware addresses, timing requirements, and data formats for every device you want to use. The debug output routines in this guide wrap the ROM calls so you don't have to remember the details every time.
Try/catch blocks, exceptions, error types, stack traces, recovery mechanisms. Errors propagate up the call stack automatically. You can catch them at appropriate levels and handle gracefully.
There are no exceptions. Error handling means checking for errors explicitly after every operation that might fail, and deciding what to do yourself.
The most common pattern is returning status in a register or flag. Here's a division subroutine that checks for divide-by-zero and reports the outcome:
; Division subroutine with error checking
; Input: A = dividend, B = divisor
; Output: A = quotient, carry flag set on error
divide:
LD C, A ; Save dividend
LD A, B
CP 0 ; Check for divide by zero
JP Z, div_error
; ... perform division ...
OR A ; Clear carry (success)
RET
div_error:
SCF ; Set carry flag (error)
RET
; Caller checks the result:
CALL divide
JP C, handle_error ; Carry set = error
; Success - display quotient
CALL print_number
LD A, 13
RST 0x10 ; Newline
JP continue
handle_error:
LD HL, msg_diverr
CALL print_string
LD A, 13
RST 0x10 ; Newline
continue:
msg_diverr: DB "Error: divide by zero!", 0
The Z80 supports hardware interrupts for handling external events:
Maskable Interrupts (INT): Can be enabled with EI and disabled with DI. Used for things like vertical blank in games.
Non-Maskable Interrupt (NMI): Cannot be disabled. Jumps to address 0x0066. Used for critical events.
Key Difference: Unhandled errors don't throw exceptions—they cause undefined behavior. The program might crash, corrupt memory, enter an infinite loop, or appear to work while producing wrong results. Defensive programming isn't optional; it's survival. Use the debug output routines liberally to verify your assumptions.
Debuggers with breakpoints, watches, stack traces. Print statements. Exception messages. Profilers. Memory analyzers. Rich error messages pointing to line numbers.
Debugging is harder but not impossible. You have two main strategies: emulator debugging tools and print-based debugging using the routines from the start of this guide.
Fuse (and most Z80 emulators) includes a built-in debugger that lets you step through instructions one at a time, set breakpoints at addresses or conditions, inspect and modify registers, view and edit memory, and watch memory locations for changes.
The debug output routines at the start of this guide are your equivalent of console.log() or print(). Use them to verify your program's behaviour at every step:
; Debugging example: verify a loop counter
LD B, 5
debug_loop:
PUSH BC ; Save B
LD A, B
CALL print_number ; Print current counter value
LD A, '; '
CALL print_char ; Print space separator
POP BC ; Restore B
DJNZ debug_loop
LD A, 13
RST 0x10 ; Newline
; Should display: 5 4 3 2 1
You can also use the routines to create simple assertion checks:
; Assert: A should equal expected value
; If not, print an error message and the actual value
assert_a_equals:
; B = expected value, A = actual value
CP B
RET Z ; Pass - values match
; Fail - display error
PUSH AF
LD HL, msg_assert
CALL print_string
POP AF
CALL print_number ; Show what A actually was
LD A, 13
RST 0x10 ; Newline
RET
msg_assert: DB "ASSERT FAIL! A = ", 0
Off-by-one errors in loops (DJNZ counts from n to 1, not n-1 to 0)
Forgetting to preserve registers across subroutine calls
Stack imbalance (more pushes than pops, or vice versa)
Little-endian confusion in 16-bit values
Confusing immediate values with memory addresses: LD A, 10 vs LD A, (10)
Flag clobbering—an instruction modifies flags you were about to test
Key Difference: Errors often manifest far from their source. A stack imbalance might not crash until three subroutine returns later. A memory corruption might not be noticed until that memory is read minutes later. Use the print_number and print_string routines frequently during development to catch bugs early.
Let's tie everything together with a complete, assembler-ready program that counts from 0 to 20, displays each value, and verifies the final count. Save this as counter.asm:
ORG 0x8000
CALL debug_init
; Print header
LD HL, msg_header
CALL print_string
LD A, 13
RST 0x10 ; Newline
; Count from 0 to 20
LD C, 0 ; C = counter
main_loop:
LD A, C
CALL print_number ; Display current value
LD A, '; '
CALL print_char ; Space separator
INC C
LD A, C
CP 21 ; Reached 21 yet?
JP NZ, main_loop ; No - keep counting
; Finished counting
LD A, 13
RST 0x10 ; Newline
LD A, 13
RST 0x10 ; Newline
; Verify: C should be 21
LD A, C
CP 21
JP NZ, verify_fail
LD HL, msg_pass
CALL print_string
JP done
verify_fail:
LD HL, msg_fail
CALL print_string
LD A, C
CALL print_number
done:
LD A, 13
RST 0x10 ; Newline
RET ; Return to BASIC
msg_header: DB "Counting 0 to 20:", 0
msg_pass: DB "Verification PASSED", 0
msg_fail: DB "FAILED! C = ", 0
INCLUDE "debug_output.asm"
END 0x8000
Assemble and run:
pasmo --tapbas counter.asm counter.tap
This program demonstrates register operations, subroutine calls and returns, loops with manual construction, comparison and conditional jumps, the stack for preserving values, and—crucially—debug output to verify that the code is working correctly.
Throughout this guide you've been using RST 0x10 and CALL 0x1601 without much explanation. This chapter explains what these are: the Z80's built-in shortcut instructions and the ZX Spectrum's library of ROM routines.
The Z80 has a set of special single-byte CALL instructions called RST (Restart). Where a normal CALL takes 3 bytes (the opcode plus a 2-byte address), an RST takes just 1 byte. The CPU pushes the current PC onto the stack and jumps to a fixed address, exactly like a CALL. When the routine hits RET, execution returns to where you left off.
Think of them as built-in function calls — shorthand for the most commonly used routines. On the ZX Spectrum, the ROM at addresses 0x0000–0x3FFF contains Sinclair's operating system, and the RST addresses are entry points into key parts of it.
The pattern is always the same: load registers with the values the routine expects, then execute the RST or CALL. It's the assembly equivalent of calling a library function.
There are exactly 8 RST instructions. Each jumps to a fixed address that is a multiple of 8:
| Instruction | Jumps To | Bytes | ZX Spectrum Purpose |
|---|---|---|---|
RST 0x00 |
0x0000 |
1 | Reset / power-on entry point |
RST 0x08 |
0x0008 |
1 | Error restart (reports errors to BASIC) |
RST 0x10 |
0x0010 |
1 | Print character in A to current channel |
RST 0x18 |
0x0018 |
1 | Collect current character from BASIC line |
RST 0x20 |
0x0020 |
1 | Collect next character from BASIC line |
RST 0x28 |
0x0028 |
1 | Calculator engine (floating-point maths) |
RST 0x30 |
0x0030 |
1 | Create space in memory (BC bytes) |
RST 0x38 |
0x0038 |
1 | Maskable interrupt handler |
The most commonly used is RST 0x10 — the Spectrum's workhorse print routine. Almost every program that displays text uses it.
Beyond the 8 RST entry points, the Spectrum ROM contains hundreds of useful routines that you call with a regular CALL instruction. Here are the most useful ones:
| Address | Name | Setup | What It Does |
|---|---|---|---|
0x0010 |
PRINT-A | A = character | Print character to current stream |
0x1601 |
CHAN-OPEN | A = channel (2=screen) | Open an output channel |
0x203C |
PRINT-STR | DE = address, BC = length | Print string of BC bytes from DE |
0x0D6B |
CLS | (none) | Clear screen |
0x0DD9 |
CL-SET | B = row (0–23), C = col (0–31) | Set print position (AT equivalent) |
| Address | Name | Returns | What It Does |
|---|---|---|---|
0x028E |
KEY-SCAN | DE = key code | Scan the keyboard for a keypress |
0x15D4 |
WAIT-KEY | A = key code | Wait for a keypress and return it |
| Address | Name | Setup | What It Does |
|---|---|---|---|
0x03B5 |
BEEPER | DE = pitch, HL = duration | Play a tone through the beeper |
The simplest possible use — print the letter 'A' to the screen:
ORG 0x8000
; Open channel 2 (main screen)
LD A, 2
CALL 0x1601
; Print 'A' using RST 0x10
LD A, 'A'
RST 0x10
RET
END 0x8000
Notice how RST 0x10 is just one byte. The equivalent CALL 0x0010 would do the same thing but take 3 bytes. In a tight loop printing thousands of characters, those 2 bytes per call add up.
Print a null-terminated string by looping over each character:
ORG 0x8000
; Open channel 2 (main screen)
LD A, 2
CALL 0x1601
; Point HL at our message
LD HL, message
print_loop:
LD A, (HL) ; load next character
CP 0 ; is it the null terminator?
JP Z, done ; if yes, we're finished
RST 0x10 ; print it
INC HL ; move to next character
JP print_loop
done:
RET
message:
DB "Hello from Z80!", 0
END 0x8000
Use the ROM's CL-SET routine to position the cursor before printing:
ORG 0x8000
; Open channel 2
LD A, 2
CALL 0x1601
; Set print position to row 10, column 5
LD B, 10 ; row (0-23)
LD C, 5 ; column (0-31)
CALL 0x0DD9 ; CL-SET: position the cursor
; Now print a message at that position
LD A, 'X'
RST 0x10
RET
END 0x8000
Instead of looping character by character, use the ROM's built-in string printer. You pass it a pointer and a length, and it does the rest:
ORG 0x8000
; Open channel 2
LD A, 2
CALL 0x1601
; Print a string using the ROM block print
LD DE, message ; DE = address of string
LD BC, msg_len ; BC = number of bytes to print
CALL 0x203C ; ROM routine: print BC bytes from DE
RET
message:
DB "Printed with one CALL!"
msg_len EQU $ - message
END 0x8000
The $ symbol means "the current address" in Pasmo, so $ - message calculates the string length automatically at assembly time. No need to count characters by hand.
Display a prompt, wait for the user to press a key, then print what they pressed:
ORG 0x8000
; Open channel 2
LD A, 2
CALL 0x1601
; Print prompt
LD DE, prompt
LD BC, prompt_len
CALL 0x203C
; Wait for a keypress
CALL 0x15D4 ; WAIT-KEY: blocks until key pressed
; key code returned in A
; Print the key that was pressed
RST 0x10 ; print it (A still holds the key)
RET
prompt:
DB "Press a key: "
prompt_len EQU $ - prompt
END 0x8000
Use the BEEPER ROM routine to play a tone. The pitch and duration values take some experimentation — lower DE values give higher pitched notes:
ORG 0x8000
; Play a beep
LD HL, 0x00C0 ; duration (higher = longer)
LD DE, 0x0040 ; pitch (lower = higher pitch)
CALL 0x03B5 ; BEEPER routine
RET
END 0x8000
You might wonder: if RST 0x10 does the same as CALL 0x0010, why bother with RST at all? Two reasons:
Size: RST is 1 byte, CALL is 3 bytes. In a routine that prints hundreds of characters, that's significant — memory was precious on a 48K machine.
Speed: RST executes in 11 T-states vs 17 T-states for CALL. At 3.5 MHz, every cycle counts in time-critical code like games.
The trade-off is that there are only 8 RST addresses available (0x00, 0x08, 0x10, 0x18, 0x20, 0x28, 0x30, 0x38). The ROM designers chose wisely which routines deserved these premium slots.
Learning Z80 assembly won't make you a faster programmer or help you ship products quicker. But it will fundamentally change how you understand computers.
You'll understand why arrays start at index 0 (because they're pointer offsets), why buffer overflows are dangerous (because there's nothing stopping you from writing past the end), why recursion can overflow the stack (because it's a finite region of memory), and why optimising compilers are miraculous (because doing it by hand is painstaking).
More than that, you'll gain appreciation for the elegant simplicity at the heart of all computing: everything is bytes, everything is addresses, everything is instructions executing one after another. The abstractions we build on top are conveniences, not necessities.
The Z80 is a beautiful teaching machine because it's complex enough to be realistic but simple enough to hold entirely in your head. When you can trace every instruction, every memory access, every flag change, you've achieved a kind of understanding that no amount of high-level programming can provide.
; 1. Write your code in a .asm file
; 2. Include the debug routines:
; INCLUDE "debug_output.asm"
; 3. Assemble:
; pasmo --tapbas yourfile.asm yourfile.tap
; 4. Load in Fuse or another Spectrum emulator
; 5. The BASIC loader auto-runs your code
The official Zilog Z80 CPU User Manual—dense but complete
'Z80 Assembly Language Programming' by Lance Leventhal—a classic text
Pasmo documentation: https://pasmo.speccy.org/
Fuse emulator: https://fuse-emulator.sourceforge.net/
World of Spectrum—ROM disassembly and Spectrum technical references
Online assemblers and emulators for immediate experimentation
Welcome to the machine.
The debug output routines introduced at the start of this guide are reproduced here for easy reference. Copy the code below into a file called debug_output.asm, or paste it directly at the end of your source files.
; =============================================================
; debug_output.asm - Debug Display Routine for ZX Spectrum
; Assemble with: pasmo --tapbas program.asm program.tap
; =============================================================
; INCLUDE this file in your program or copy the routines below.
;
; Call debug_init once at the start of your program to set up
; the screen channel. Then use:
;
; print_char - Print a single character. Input: A = char
; print_string - Print a null-terminated string. Input: HL
; print_number - Print an 8-bit number (0-255). Input: A
;
; For a newline, just do: LD A, 13 / RST 0x10
; =============================================================
; --- Call once at program start to open screen channel ---
debug_init:
LD A, 2 ; Channel 2 = main screen
CALL 0x1601 ; ROM: open channel
RET
; --- Print a single character in A ---
print_char:
RST 0x10 ; ROM: print character in A
RET
; --- Print null-terminated string at HL ---
print_string:
LD A, (HL)
OR A ; Check for null terminator
RET Z ; Return if end of string
RST 0x10 ; ROM: print character
INC HL
JR print_string
; --- Print 8-bit unsigned number in A (0-255) ---
print_number:
PUSH BC
LD B, 0
LD C, A ; BC = the number
CALL 0x2D28 ; ROM: STACK_BC - push onto calculator stack
CALL 0x2DE3 ; ROM: PRINT_FP - print the number
POP BC
RET
; =============================================================
; END OF DEBUG ROUTINES
; =============================================================
Here is a complete example program that uses the debug routines to verify some arithmetic and conditional logic. Save this as test.asm and assemble it with:
pasmo --tapbas test.asm test.tap
Then load test.tap in your emulator (e.g. Fuse). The BASIC loader created by --tapbas will auto-run the program.
ORG 0x8000
; --- Main program ---
CALL debug_init
; Test 1: Print a string
LD HL, msg_hello
CALL print_string
LD A, 13
RST 0x10 ; Newline
; Test 2: Do some maths and print the result
LD A, 25
ADD A, 17 ; 25 + 17 = 42
CALL print_number ; Should display 42
LD A, 13
RST 0x10 ; Newline
; Test 3: Conditional logic - print PASS or FAIL
LD A, 42
CP 42 ; Is A equal to 42?
JP NZ, test_fail
LD HL, msg_pass
CALL print_string
JP test_done
test_fail:
LD HL, msg_fail
CALL print_string
test_done:
LD A, 13
RST 0x10 ; Newline
RET ; Return to BASIC
msg_hello: DB "Debug output test", 0
msg_pass: DB "Test PASSED", 0
msg_fail: DB "Test FAILED", 0
INCLUDE "debug_output.asm"
END 0x8000
When you run this program in an emulator, you should see:
Debug output test
42
Test PASSED
This confirms your arithmetic produced 42 and the conditional check passed. You can now use these routines anywhere in your code to inspect register values, verify calculations, and trace program flow.