I occasionally see cool demoscene stuff on Twitter, and I thought it'd be cool to deconstruct a simple example to see how it worked.
I'm not very familiar with x86 assembler and DOS programming, so this will go slow - I'll talk about all my findings along the way and how I worked things out. If you are familiar with this area, this will bore the hell out of you. But if you're not, you might enjoy following along.
Check out the GIF here before proceeding. I will not dwell on how absolutely remarkable it is that 16 bytes is enough to produce a plausible animated effect.
I worked through this mostly on Windows 10, although I used WSL for things like supporting scripts when working stuff out.
pouet.net provides .asm source code and a built .com file for the demo. The .com file worked perfectly in DOSBox, although the scrolling was a bit slow until I increased the simulated cycle speed.
Building x86 Assembler
I decided to use NASM. I don't know this space at all, but it's easy to use with approachable docs. Building the .asm was very easy - just a matter of running
nasm kasparov.asm -o kasparov.com. Sure enough, it produced a 16-byte .com file!
Grokking the code
I did some assembler in university, but not x86. I've always wanted to learn a little more, and there are only 8 instructions to understand. How hard can it be?
Here's the program.
1 X: add al,13h 2 int 10h 3 mov al,cl 4 xor al,ch 5 or ax,0CDBh 6 loop X 7 dec dx 8 jmp short X
A detailed rundown of what I worked out follows.
1 X: add al,13h
Note that there's no prelude, header, data segment or anything like that - this gets right down to business and starts issuing instructions.
X is a label which means this instruction can be jumped to which we'll get to later.
The file is written in "Intel Syntax" which means that the destination comes before the source.
This instruction adds 13h (hex notation is designated by the trailing h) to the lower byte of the
AX register (designated by
AX register is initialised by DOS to zero, so this effectively sets
AX to 13h. Not just directly setting it to 13h is an optimization which we'll get to later on.
Intel 80386 and higher CPUs have 4 general purpose 32 bit registers:
They can be used as 32 bit, 16 bit, or 8 bit registers, depending on how you refer to them:
- All 32 bits:
EAX EBX ECX EDX
- The least significant 16 bits:
AX BX CX DX
- Bits 8-16:
AH BH CH DH
- The least significant 8 bits:
AL BL CL DL
2 int 10h
This triggers interrupt 10h. Triggering an interrupt is how DOS programs ask the BIOS, DOS, or some piece of hardware to do things. The CPU jumps to some other preloaded function which interprets the arguments to the interrupt based on the values of certain registers. More modern operating systems and hardware now have different mechanisms for making system calls but the principle is the same.
On PCs, interrupt 10h is installed by the BIOS when the machine powers on, and provides a simple API for displaying graphics. There's a basic API that you can assume all PCs will support, plus some extensions which will be hardware specific. I'm not sure if the BIOS of modern PCs still loads video functions at this interrupt, but DOSBox emulates it.
Interrupt 10h does different things depending on the
AH register. In this case, its 0, which means set the video mode according to value of the
This instruction is called in a loop as we'll see later, but the first time around, register
AL is 13h, which is VGA - 320x200 pixels in 256 glorious colours.
Mysterious XORing and ORing
3 mov al,cl
4 xor al,ch
5 or ax,0CDBh
In the first instruction, we copy the contents of
AL. The first time this instruction runs,
CX hasn't been set, but DOS has initialized it to 00FFh, so
AL will now be
We then xor
AL. Or in other words, set
In the third instruction, we do
OCDBh (a constant). Or in other words, breaking down the AX register into two 8-bit registers:
AH prior to this instruction was
0, so now its
FF, so it remains
Let's set aside what the point of this is for one moment, but the post-condition is that
6 loop X
This instruction decrements
CX and jumps back to X unless
CX has become 0.
Whether it decrements
CX (a 16 bit register) or
ECX (a 32 bit register) depends on the address-size attribute of the instruction. To the best of what I could find out, unless the attribute explicitly set in the code (which it isn't here), this defaults to 16-bit when the CPU is in real mode. And I'm old enough to know that DOS programs run strictly in real mode, preventing them from executing if you have already started Windows, which switches the CPU to protected mode.
CX is a 16 bit register still not set by this program, so at this point its
00FFh, or 255 in decimal. This means that there will be 255 iterations of the code between this line and X while
CX is decremented at each iteration. When
CX hits 0, control proceeds to the next instruction.
What's the point of this looping?
Well, in all subsequent iterations,
AH will always be
0Ch. We only ever write to it on line 5 as part of
As explained above,
AH controls what the interrupt 10h does - it selects which functionality is invoked. In the first call, it was set to
0 which caused the video mode to be set, but now its set to
0Ch. And we now know that this interrupt is being called every iteration in a loop.
0Ch, interrupt 10h (on line 2) changes the color of a single pixel, using
ALto set color
CXto set column
DXto set row
Okay, now things are starting to make sense. We're writing coloured pixels in a loop - not changing the row but decrementing the column. And we're doing some funky XORing to change the color every iteration. So this loop draws a long horizontal patterned line. That makes sense for something that's supposedly drawing a chessboard - the colour sequence must be responsible for the tiled appearance somehow.
Last bit of the code:
7 dec dx
8 jmp short X
DX, which is responsible for the pixel row in the interrupt 10h, and jumps back to the top of the program. This effectively tells the program to run itself in a loop infinitely.
Note that when
DX reaches 0, decrementing it here causes an integer underflow and
DX wraps around from 0 to 65535 (the highest 16 bit number). There will then be 65535 iterations before
DX again becomes 0 and wraps around again. This behaviour also applies to the
CX register and means that after the initial 255 iterations of the inner loop (from lines 1 to 6) there will be 65535 inner loop iterations for every iteration of the outer loop.
DX is initialized by DOS to the value of the CS register which effectively makes it something random when the program starts.
So now we have the broad outline - the program sets VGA mode, then does something like this pseudo-code. I've abstracted the XORing and ORing into a function
get_color for readability, and the graphics operations into
// x and y are whatever DOS sets them to. uint16 x = 255 uint16 y = an undefined 16 bit integer while true: // Write a row of pixels from x down to 0. // // The first row will be 255 pixels long. // Rows after the first will be 65535 pixels long. // // When writing rows after the first, x wraps around from 0 // to 65535 on the first iteration. do: c = get_color(x, c) write_pixel(x, y, c) x-- while x > 0 // Decrement the y offset of the next row. y--
So we just need to understand why that code produces something that looks like a scrolling chessboard, which is the subject of the next post :)
Here's some references I used: