Deconstruction of a 16 byte demo part 2

Part 2 in a series of blog posts. Part 1

In the previous post, we saw how our program writes coloured pixels across the same row in a loop, with an outer loop decrementing the row.

Here's a reminder of what the program looks like:

1   X: add al,13h
2   int 10h
3   mov al,cl
4   xor al,ch
5   or ax,0CDBh
6   loop X
7   dec dx
8   jmp short X

In this post, I'm going to explain how this program leverages wrapping behavior when setting pixels.

I initially thought that the outer loop was responsible for iterating successively over rows as part of a drawing routine that drew column by column, row by row.

However, I discovered that's not the case! Disabling the outer loop entirely by commenting it out still caused a chessboard to be drawn on the screen, but without a scrolling effect. You can actually draw a chessboard in DOS with 6 bytes!

DOSBox debugger

I had a hypothesis that there was some kind of line wrapping going on, but I couldn't quite visualise it. So my instinct was to reach for some tooling to see if I could get more insight.

I couldn't get the DOSBox debugger to work properly. I compiled my own DOSBox from trunk using MinGW which all went fine, but the debugger itself didn't seem to respond properly to keyboard shortcuts so I couldn't step through the code. It seems like there were other problems with the build as well - the screen wasn't properly updating. I suspect somehow the instructions to compile trunk aren't up to date, and I need a different version of one of the dependencies. Or hey maybe trunk's just broken, who knows.

Here's a screenshot of it paused, clearly showing our program frozen in time with the values of the registers - looks great if it'd respond to my keyboard strokes!

DOSBox Debugger

Slowing down the renderer

After a little bit of hacking away at the code to break it in ways that confirmed a wrapping hypothesis, I decided to add a NOP loop to each loop to slow down drawing and see if I could get a really clear animation of what was happening. Here's what I added to the inner loop:

mov bx,1000 N: nop dec bx cmp bx,0 jne N

This is the equivalent of

int x = 1000; while (x != 1000) (nop) x--;

This slowdown helped to produce a fairly clear cut animation of the chessboard being drawn.

Slowly drawn chessboard

Note that you can just slow DOSBox down, but I couldn't slow it down enough to produce such a clear result without inserting no-ops.

What's going on?

Pixel Wrapping

Recall that in our loop, CX is being set in a range 0-65535 which controls the x value of the pixel being set. The odd thing about this is that the displayable screen in the mode we've set is only 320 pixels wide. My initial assumption when reading this code is that pixels set with x > 320 would be truncated, as that's how things usually work in modern graphics frameworks in which you're drawing to some sort of viewport - but that's not the case here. It turns out that pixels with an x offset greater than 320 wrap to the next line, and if the offset is greater 640, they wrap to the next line after that and so on.

I worked this out by trial and error. It's not documented in any of the references I was looking at, although I bet its in some VGA programming black books and everyone who's ever done VGA programming knows about it. Especially demo scene people. Fun!

My hypothesis for why this is the case is that when the interrupt handler sets pixel values, its actually writing directly to video memory, which is just some area in memory interpreted by the hardware as a two dimensional pixel array. So when the program triggers interrupt 10h to set a pixel at (x, y), the handler just sets a value in memory at the address (video memory offset) + y * 320 + x.

For example, setting a pixel at (100, 200) means setting the memory at (video memory offset) + 200 * 320 + 100, or in other words (offset) + 64100. Well, you can also write to that very same pixel as (64100, 0) or (video memory offset) + 0 * 320 + 64100 - which is what this program does. Using this technique, it can write an entire screen of pixels with just one loop.

It seems to be unusual and very slow to set pixels using this interrupt handler. The normal way of doing it is to just write directly to a memory mapped region at A000:0000. I think the reason this is being done is to fit into 16 bytes!

y-offset and scrolling

The outer loop, by decrementing DX, decrements the y-offset at which this long wrapped line starts to be drawn. So each time the chessboard advances up by one, that's DX being decremented once. DX like CX also takes on values (0-65535) that are greater than the vertical pixel height of the screen (200), so what happens then?

Well, the underlying two dimensional pixel array has 64k pixels (slightly greater than 320x200, because computers like powers of 2 - there's a hidden area at the bottom of the screen!) and so supports a 16-bit unsigned integer index. When that index (y * 320 + x) doesn't fit in 16 bits, my hypothesis is that wrapping occurs as a result of 16-bit integer overflow. This actually happens whenever DX (the y-offset) is greater than 204, which is most of the time! It turns out that the effect of this is to cause drawing to wrap around from the bottom to the top of the screen and happily continue.

I expected there might be a scrolling glitch when DX reaches 0 and wraps around to 65535, but I couldn't spot it. I think this is because the chessboard pattern is vertically cyclical with a phase of 64 pixels (each square is 32 pixels high, and the checkered pattern repeats every two squares), and 65536 is divisible by 64, so the wrapping doesn't produce a change of phase and therefore there's no glitch.

I confirmed this by looking at the output when DX is hard-coded to 0 (on the left) and then 65535 (on the right):

DX = 0 on the left, DX = 65535 on the right

They're almost identical, but if you look closely at the top edges of each, you can see that on the right the chessboard has shifted its pattern upwards a little bit. You can also see on closer inspection that the patterns are a bit wonky and not actually all lining up! We'll get into that in the next post.

Lastly, the consequence of the CX and DX registers being unintialized and therefore set to weird DOS determined values is that where drawing starts for the first long wrapped row is a bit random - its kind of starting halfway through a chessboard. This causes a noticeable flicker when the program starts if you look out for it, and you can see it in the slowly drawn chessboard animation earlier in this post, as the first "frame" is drawn starting halfway up the screen, with the rest black. This is quickly drawn over by the first chessboard and the artefact vanishes.

In the next post, we'll look into how the chessboard pattern is generated.

References