Friday, July 27, 2012

Raspberry Pi Simple LED flash

So, although you can't see it, that LED is changing between on and off every second.

The LED is connected into pin 11 via a 1K2 ohm resistor and the other LED leg goes to 3v3 (pin 0). It wasn't difficult to do once I was on an unfiltered Internet connection (to install the necessary software onto Raspbian-17/5/2012).

First I had to:
sudo apt-get install python-dev
This is to install the Python development libraries - Python is already installed but the I/O library needs the headers and stuff to build. Next I downloaded RPi.GPIO-0.3.1a.tar.gz then:
tar zxf RPi.GPIO-0.3.1a.tar.gz
cd RPi.GPIO-0.3.1a
sudo python install
And we also need a program, which looks like this:
import RPi.GPIO as GPIO
import time 

print "Setup pin 11"
GPIO.setup(11, GPIO.OUT)

while true:
  print "Set output false"
  GPIO.output(11, False)
  print "Set output true"
  GPIO.output(11, True)

All of this is based on the instructions here, although they are very slightly out of date. 


Sunday, July 22, 2012

More Raspberry Pi Printing Tests

Here is some evidence to see if the scroll is the limiting factor for the Raspberry Pi on a local session.

(For more precise machine specifications, see previous blog post)

Some Basic Analysis of the Results

All the results are much faster without a newline per number.

In the case of HDMI Screen ‘for_print_comma” - I’m not sure why it takes 8ms to print 30 numbers since three times that number fit onto a single line. This seems to suggest an effect that is not the scroll speed.

It’s also interesting that both the SSHed results are faster - I’m guessing that the newline is causing an expensive flush (and maybe a new packet) each time. The ‘for_print_comma’ speeds are now of the same order.

The local Terminal result on the Core i7 is also faster - which would be related to the newline flush and the scroll speed not being CPU independent.

I also still don’t understand why the ‘for_print’ on the Raspberry Pi is so much slower than the Dreamplug. They are both ‘similar’ processors, running Debian Linux (albeit two major versions apart).

Perhaps the slow response of the HDMI screen and the SSH to newlines is related?


Raspberry Pi - HDMI Screen

for_print = 70279.1 microseconds/function ( total = 70.2761 s)
for_print_comma = 8067.89 microseconds/function ( total = 8.06789 s)

Raspberry Pi - SSH

for_print = 12303.9 microseconds/function ( total = 12.3039 s)
for_print_comma = 605.202 microseconds/function ( total = 0.605202 s)

Dreamplug - SSH

for_print = 857.302 microseconds/function ( total = 0.857302 s)
for_print_comma = 365.109 microseconds/function ( total = 0.365109 s)

Core i7 - Terminal

for_print = 161.338 microseconds/function ( total = 0.161338 s)
for_print_comma = 16.202 microseconds/function ( total = 0.016202 s)

Source Code

# Various really stupid Python benchmarking tests
# Rob Probin 2012

import time
import math

def for_print():
    for i in xrange(30):
        print i

def for_print_comma():
    for i in xrange(30):
        print i,
test_suite = [
    ("for_print", 1000),
    ("for_print_comma", 1000),

if __name__ == '__main__':   
    result = []
    from timeit import Timer
    for test in test_suite:
        test_name, passes = test
        print("RUNNING TEST:", test_name)
        t = Timer(test_name+"()", "from __main__ import "+test_name)
        test_time = t.timeit(number=passes)
        test_results = (test_name, test_time, test_time/passes)

    for result_set in result:
        (name, total, func_time) = result_set
        print "%s = %g microseconds/function ( total = %g s)" % (name, func_time*1000000, total)

Saturday, July 21, 2012

Raspberry Pi Python Timings

Someone interestingly produced some Python output tests (here) which showed a problem with the I/O speed on Python at the moment on the Raspberry Pi - maybe due to the technique of open/closed files for userland I/O (unconfirmed guess based on this).

When discussing this, my good friend Stu asked me to time how long it took Python to print out some numbers on a Raspberry Pi. Well, obviously that was way too simple. I came up with 7 very silly tests and used the Python module ‘timeit’, which tries to avoid some common problems when timing code. Although, as can be seen from the two runs on Raspberry Pi on SSH, these timings are interestingly inconsistent.

Lastly a warning about the results below: There are a lot of variables (memory, python versions, exactly what background tasks there are, the immaturity of Raspbian (the Debian/Linux operating system distribution)). Also the Python tests are very stupid and overly simple, and, as with all benchmarks, don’t represent real application in the slightest. Therefore I’d warn against taking these seriously at all. Also lots of people have already done real applications. I’d like to time these on an equivalently priced 8-bit (or even 32-bit) single-chip Microcontroller board. Oh, that’s right, Python wouldn’t even run....

Some Basic Analysis of the Results

The ‘normal PC’-type (expensive) computer is faster; not surprising to me.

The Dreamplug is slower at floating point (it doesn’t have hardware floating point) , but faster at the list and empty function tests - probably because the 70% increase in speed directly relates to clock speed and the ARMv5 to ARMv6 extensions don’t make up for this basic difference.

The printing tests are very interesting, and I haven’t really had enough time to consider why some of these are so strange, especially the Dreamplug vs. Raspberry Pi on the SSH tests (both are wired via 100 Mbit/s Ethernet into a WiFi router).

One thing is clear: printing and probably specifically scrolling the screen up one line on a real FullHD (1920x1080) screen is SLOW on the Raspberry Pi because it doesn’t have any hardware acceleration enabled - pushing two million pixels (maybe 6 or 8 millions bytes) to print EACH number is a lot of work. (2.3ms work, by the look of it, or 1.6 million clock cycles if my calculations are correct).

Future Tests

Print to the same line on the print tests. (just on screen on Raspberry Pi) - either just without newlines or by resetting the cursor back to the line start, therefore avoiding ALL scrolling.

Run the same tests in another scripting language and in C.


2012-07-15-wheezy-raspbian Raspberry Pi ARM1176JZF ARMv6 700MHz
SDRAM: 256 MB (shared with GPU)
Python 2.7.3rc2 via SSH terminal session (Wired Ethernet)
Text Shell only, no other apps running

while_print = 11724 microseconds/function ( total = 117.24 s)
for_range_print = 11643.1 microseconds/function ( total = 116.431 s)
for_xrange_print = 11633.1 microseconds/function ( total = 116.331 s)
list_append = 365.965 microseconds/function ( total = 36.5965 s)
empty_func = 3.13354 microseconds/function ( total = 3.13354 s)
sys_sleep_func = 100184 microseconds/function ( total = 10.0184 s)

RUN2 (with float test added):
while_print = 11506.4 microseconds/function ( total = 115.064 s)
for_range_print = 11401.8 microseconds/function ( total = 114.018 s)
for_xrange_print = 11381.6 microseconds/function ( total = 113.816 s)
list_append = 313.054 microseconds/function ( total = 31.3054 s)
empty_func = 2.17917 microseconds/function ( total = 2.17917 s)
sys_sleep_func = 100188 microseconds/function ( total = 10.0188 s)
float_func = 10.9426 microseconds/function ( total = 10.9426 s)

2012-07-15-wheezy-raspbian Raspberry Pi ARM1176JZF ARMv6 700MHz
SDRAM: 256 MB (shared with GPU)
Python 2.7.3rc2 terminal on HDMI with Full HD frame buffer (no acceleration)
Text Shell only, no other apps running

while_print = 70540.2 microseconds/function ( total = 705.402 s)
for_range_print = 70276.7 microseconds/function ( total = 702.767 s)
for_xrange_print = 72477.2 microseconds/function ( total = 724.772 s)
list_append = 424.997 microseconds/function ( total = 42.4997 s)
empty_func = 3.98467 microseconds/function ( total = 3.98467 s)
sys_sleep_func = 100213 microseconds/function ( total = 10.0213 s)
float_func = 12.9488 microseconds/function ( total = 12.9488 s)

2012-07-15-wheezy-raspbian Raspberry Pi ARM1176JZF ARMv6 700MHz
SDRAM: 256 MB (shared with GPU)
Python 2.7.3rc2 terminal on HDMI with Full HD frame buffer (no acceleration) - print to file (on a slow 2GB SD card)
Text Shell only, no other apps running

python > results.txt
tail results.txt

while_print = 632.781 microseconds/function ( total = 6.32781 s)
for_range_print = 670.373 microseconds/function ( total = 6.70373 s)
for_xrange_print = 647.219 microseconds/function ( total = 6.47219 s)
list_append = 465.027 microseconds/function ( total = 46.5027 s)
empty_func = 3.63238 microseconds/function ( total = 3.63238 s)
sys_sleep_func = 100196 microseconds/function ( total = 10.0196 s)
float_func = 15.9253 microseconds/function ( total = 15.9253 s)

Debian Lenny Dreamplug Marvell Sheeva ARMv5 (no FPU) 1.2 GHz
SDRAM: 512MB 800 MHz 16bit DDR2
Python 2.5.2 via SSH terminal session (Wired Ethernet)
Text Shell only, no other apps running

while_print = 954.053 microseconds/function ( total = 9.54053 s)
for_range_print = 935.227 microseconds/function ( total = 9.35227 s)
for_xrange_print = 932.759 microseconds/function ( total = 9.32759 s)
list_append = 193.037 microseconds/function ( total = 19.3037 s)
empty_func = 1.55294 microseconds/function ( total = 1.55294 s)
sys_sleep_func = 100203 microseconds/function ( total = 10.0203 s)
float_func = 17.3767 microseconds/function ( total = 17.3767 s)

Mac OS X 10.6.6 Core i7 (Dual with HT) 2.66GHz
SDRAM: 4 GB 1067 MHz DDR3
Python 2.6.6 in Terminal
6 apps running

while_print = 159.976 microseconds/function ( total = 1.59976 s)
for_range_print = 151.213 microseconds/function ( total = 1.51213 s)
for_xrange_print = 154.768 microseconds/function ( total = 1.54768 s)
list_append = 19.8449 microseconds/function ( total = 1.98449 s)
empty_func = 0.161463 microseconds/function ( total = 0.161463 s)
sys_sleep_func = 100101 microseconds/function ( total = 10.0101 s)
float_func = 0.347249 microseconds/function ( total = 0.347249 s)

Source Code

# Various really stupid Python benchmarking tests
# Rob Probin 2012

import time
import math

def while_print():
    i = 1
    while i <= 30:
        print i

def for_range_print():
    for i in range(30):
        print i

def for_xrange_print():
    for i in xrange(30):
        print i

def list_append():
    L = []
    for i in range(100):

def empty_func():
def sys_sleep_func():

def float_func():
    k = 1.5
    return math.sqrt(k*3.45)
test_suite = [
    ("while_print", 10000),
    ("for_range_print", 10000),
    ("for_xrange_print", 10000),
    ("list_append", 100000),
    ("empty_func", 1000000),
    ("sys_sleep_func", 100),
    ("float_func", 1000000),

if __name__ == '__main__':   
    result = []
    from timeit import Timer
    for test in test_suite:
        test_name, passes = test
        print("RUNNING TEST:", test_name)
        t = Timer(test_name+"()", "from __main__ import "+test_name)
        test_time = t.timeit(number=passes)
        test_results = (test_name, test_time, test_time/passes)

    for result_set in result:
        (name, total, func_time) = result_set
        print "%s = %g microseconds/function ( total = %g s)" % (name, func_time*1000000, total)

Sunday, July 15, 2012

Gulpman (Maze Game) Monster Movement

The 1982 ZX Spectrum game I cloned last year (with new code) was called Gulpman written by John A. Campbell. This clone was written without any knowledge of the internals. Over the last two weeks I’ve taken apart the original - and have a pretty much complete annotated disassembly. See my previous blog post for more information on this (including the disassembly).

You can find information about the game here and even play it online (or actual size with instructions here Sadly the web-emulated version is without the fantastic sound effects; download an emulator for the best experience! [Edit 7-Oct-2012: See this post about better web-based emulation]

In this post I’ll detail my understanding of the monster movement algorithm used, based on this disassembly. This is one of two relatively complex algorithms in the code - the other being the (automatic) demo player. All other functions - although long - are straightforward and relatively obvious.

As a terminology note: what I refer to as ‘monsters’ in this article are referred to by the instructions as ‘four grumpy chasers’.

Since the algorithm details are quite long, I’ve put the ‘final’ paragraphs first... (for the less-techie readers).

Movement Summary

If I had to summarise the movement algorithm it would be:
“It will always try to go toward the player, but if that fails it will step in a different direction (favouring perpendicular, i.e. 90 degrees. a right angle) repeatedly until it can either move towards the player or it is blocked from moving in that direction by something”
However, this misses out a huge amount of detail that effects real-world movement.


The algorithm works surprisingly well based on it’s size. You hardly ever see it get stuck (i.e. a monster hovering looking for a way to go) and it navigates around walls surprisingly well.

Part of this is helped by the maze layouts and the fact there are another three other monsters chasing you.

One case of ‘getting stuck’ is in the ’N’ or ‘U’ of the Maze ‘O’ Gulp-man maze. However, this doesn’t offer much help - since it’s very difficult to get all of the monsters stuck at the same time... (although I wouldn’t say that was impossible... just unlikely)

More Info

There are some more comments from John himself  in the emulation permission for World of Spectrum where he talks about “being pleased with the algorithm” (see ).

For real retro-interest, there was a ZX81 version as well: (and click the play button to run the file).

I’d be willing to post both the cloned version and the original ZX Spectrum annotated disassembly on-line, but really need to get in contact with Mr Campbell to do this (to get copyright permission).

Algorithms in Other Games

If you are interesting in these things, it’s also worth looking at the PAC-MAN ghost behaviour which isn’t as simple as it sounds either and

There is further discussion of possible maze movement techniques here:


Monster vs. Player Speed

Part of the monster interaction is determined by the speed of the monsters.

The whole game speed is set by ‘Tempo’ which simply slows up the whole game loop by a delay. However, this does not affect the monsters algorithm in itself - except the whole game runs faster and the human needs to react quicker (and hence does effect difficulty somewhat).

The player moves every 16 game loops. The all the monsters then move at a separate pace.

This pace is set by two parameters parameters:
  • the ‘Grade’ setting - to quote the instructions “the relative start speed of the [monsters]”
  • Player success - to quote the instructions “every 1000 points [monsters] go a little faster”

This is a snippet from my annotated disassembly:

; reset monster speed
; this is made up of several components... grade,
L69B5: 3E 40        LD A,40h
L69B7: DD 96 A4     SUB (IX+0A4h)    ; 632A-5Ch = 62CEh = grade_number '1' (31h) to '9' (39h)
                                    ; A result will be 15 for easy and 7 for hard
L69BA: 87           ADD A            ; A=A*2, so easy is 30 and 14 is hard.
L69BB: DD 86 12     ADD (IX+12h)    ; 632A+12=Player Success. 14 at start. 0 at lowest.
                                    ; A total = 44 or 14.
L69BE: C6 08        ADD 8h            ; Give the player more advantage.
                                    ; A total = 52 or 22.
L69C0: DD 77 1F     LD (IX+1Fh),A    ; 632A+1F=monster speed, load with new calculated value

So for those of you that don’t read Z80 assembly, that is:

    monster_speed_count = ((64 - grade_number_ascii) * 2) + player_success + 8

Grade number is the ASCII value (‘grade_number_ascii’) - as per the ‘GRADE’ game setting, where ASCII ‘1’ is 49 decimal and ASCII ‘9’ is 57 decimal. T’he setting of 9’ is harder.

The player success (‘player_success’) starts at 14 and decreases by 2 (bottom 0) every 1000 points earned. So only the first 7 bonuses affect monster speed. (What I called ‘player success’ might be a bad name for this variable, but still, I’m sure you get the idea.)

This monster_speed_count is decremented until it’s zero. When it’s zero the above code is run to set the next monster run speed and the four monsters are all individually run.

Note: the monsters always run slower than the player. Based on my quick calculations that means that the monsters run at 325% slower than the player at slowest and 37.5% slower at fastest.

The equation seems logical - make the game harder based on both requested hardness and on player experience and skill, but I expect the constants in this equation were arrived at by trial and error (as is often the case).

Data Structure

Each of the four monsters have a 8-byte data structure as follows:

Byte    Action
 0    monster code - same as UDG (user defined graphic) character code
 1    default position as address in map data (least significant byte)
 2    default position as address in map data (most significant byte)
 3    monster_position as address in map data (least significant byte)
 4    monster_position as address in map data (most significant byte)
 5    what is under the monster? (e.g. an apple or a space)
 6    second choice offset if primary fails? (0=no primary/secondary, i.e. back up system off)
 7    primary offset to move in ... "Primary choice" in this case is the way we REALLY want to go towards the player!

Normal Case Algorithm

Terms used here:
    Column represents the horizontal position, i.e. left and right.
    Line represents the vertical position, i.e. up and down.

In the normal (basic) case we first try to get to the same column as the player - i.e. we follow the player left and right. After we get the same column (or the way we want to go left-or-right is blocked by a wall or another monster) we try to get the same line. If both of these are blocked then we run some ‘special-case’ algorithm which is detailed in the next section.

On the face of it this basic column-then-line method might appear naive, but consider the following:

  • The playing area is wider than it is high so it’s somewhat logical to choose a column-based left-right direction first.
  • The monster movement controller subroutine (function) may well run when the columns are the same if the player passes us underneath - so this then causes the line match mode to run - making the monster (try to) move up and down. Since on most mazes (and in most cases on all mazes) movement in this direction will be blocked, hence causing the special case to run.
These are well worth expanding:

  1. It introduces some level of randomness into the process when the columns are the same and the player passes underneath, since the player cannot easily predict if the monster is simply going to follow the player back the other way, or the special case code takes over and it goes the other way.
  2. You can see it clearly on Maze ‘O’ the GULP-MAN maze when a monster is in an open area and the player repeatedly passes left-and-right; hence the special case code is never run and column matching overrides line. However, it still doesn’t give the player very much respite in a real game with three other monsters heading towards him and the fourth getting slowly closer!

In a lot of simple maze games that just try to follow the player, the monsters get stuck behind walls (because they are taking some form of direct route to the player) and this usually results in a very boring game. Some strategy is usually required to overcome simple-follower stupidity.

Special Case Algorithm - Summary

The special case algorithm has to choose a direction to travel that is not (necessary) towards the player. As soon as it discovers these special case modes allow a move towards the direction the player was in or is blocked (by a wall or monster) it reverts to trying to chaise the payer (normal case algorithm).

The algorithm has a small feature to mix things up a bit and a slightly different behaviour if the player is on the same line.

Special Case Algorithm - Detail

So, when the direction towards the player is blocked in both line and column direction (normal case algorithm), this ‘special case’ code is run.

This piece of code (30 lines of Z80 assembler, plus two tiny tables) is relatively complex for it’s size.

First it sets up a primary direction towards the player.

The primary direction is straight forward:

  1. up if the player is on a higher line
  2. down if the player is on a lower line
  3. right if the lines are the same and the player is to the right
  4. left if the lines are the same and the player is to the left

Next it tries to move in directions from one (of two) tables. It uses a first table if the player and the monster are on the same line, or a second table if the lines are not the same. Here are the tables (with my comments and label names):

; used when the lines are the same ... we want to go up or down really ... but
; if that fails try right and left.
L672D:              DEFB E0, 20, 01, FF

; used when the lines aren't the same ... try lines then columns for movement.
L6731:              DEFB FF, 01 E0 20

Here FF means go left, 01 mean go right, E0 means go up and 20 means go down, so these could be re-written as:

monster_move_offset_1:        up, down, right, left   ; player and monster on same line
monster_move_offset_2:        left, right, up, down   ; player and monster on different lines

However, before it tries to move based on the table chosen, it swaps the first and second entries of this table. And note these tables are common for all monsters. It’s difficult to know exactly what this does without more runtime analysis, but I guess this might have several effects:

  • Specific monsters don't make the same decision as the previous if they are in the same predicament.
  • It also might solve 'stuck' problems for a specific monster.
After this swap, it tries each four options from the left hand table-entry to the right table-entry in turn. For the case on the same map line this will mean it will try to step up and then down first. For the case where they are on different lines, it will try to step left and right first.

If a movement in a direction succeeds it stores that direction in the ‘secondary’ direction and that’s everything completed for this time around.

There is a possible case where we cannot move in any direction - due to being surrounded by a combination of walls or monsters. In this case, the last entry in the table will be stored in the ‘secondary’ direction variable before leaving with the monster unmoved.

Special Case - Next Time the Monster Runs

The next time this monster runs (as defined by monster speed count as above), then (if we previously ran the special case algorithm) it will first try the primary direction (as defined last time we calculated this primary/secondary direction which it might be out of date but in normal cases only slightly) - this is towards where the player was when the special case started. If the primary direction works, then we go back to the normal case algorithm.

If, the primary direction is blocked, then we will try the secondary direction stored from the above processing that succeeded last time. If that doesn’t succeed (we’ve moved to a wall, or another monster is in the way) then we run the normal case algorithm (and possibly special case algorithm) again.

If the second direction does succeed then we will run this primary-then-secondary test again the next the monster runs - again and again  until the primary succeeds or the secondary is blocked.

One note about this two step special case system is that the monsters - when blocked from their objective (e.g. directly above the player but with walls in between) tend not to have more consistency in movement and don’t just hop backwards and forwards.

Phew! If you got this far, congratulations :-)

Saturday, July 14, 2012

My Raspberry Pi arrives!

 Simple video of me opening my Raspberry Pi package...

Wednesday, July 11, 2012

A Story of Z80 Disassembly in the 21st Century

A little over a year ago, I wrote a clone of an ancient (1982) Sinclair ZX Spectrum maze game whilst I was away on holiday for the iPhone for fun. I also eventually did a Mac port (mostly for debugging) and then a Windows port ‘just because’.

I got most for the core functionality done part time in two weeks - and spent a bit of time between other projects over the next few months filling out random bits when I had the time. There are still a couple of bits to do at this stage - but it’s not far off playing very close to the original - in fact, you’d have difficulty telling them apart unless you knew where to look.

I did this without looking at the internals (e.g. by reverse engineering from the binary) so all the code is an untainted ‘copyright-clean’ engineered version. Obviously the graphics, instructions and maps probably have enough originality to still be copyright - other items may or may not - although they all look weaker to me.

One thing that always bugged me was the ‘monster’ movement. In my version I used a straight forward ‘head towards the player’ with a simple wall follower if it’s obvious that’s going to get the monster stuck behind a wall. This worked reasonably - although there were circumstances I wasn’t happy with. I had other ideas like better pathfinding and a flood-fill maze solver like Micromouse robots use.

However, I was always curious how the original code worked for the monster movement - especially as the whole game was just over 6KBytes including graphics and mazes. There are two ways of finding this out:
    1.    Look at behaviour and try to figure out the algorithms behind it,
    2.    Disassemble the program and figure out how it works.

The first option looked less easy and fun, and seemed like (since I’d been observing the game for a while) it might not yield the answer accurately.

So this holiday I got a out a tiny & simple Z80 disassembler that was posted online ages ago (and I modified for a different project a while ago), created a string viewer, a graphics viewer and hex viewer and started to convert the binary machine code and data to a human readable format.

Like many of these things, it always takes longer than you think. And maybe just looking at the behaviour would have been faster in hindsight - but I’m not convinced. But I’ve had fun and also started to remember how the Sinclair ZX Spectrum worked, looked up the memory map, system variables, character set, ROM disassembly, various Z80 instructions, etc.

Also the monster movement control code is deeply buried - so at this point I’ve basically converted almost the whole program into human readable assembler instructions with comments and human labels (both for data and code). Also I’ve converted text string data and identified other data as well.

I’ve got the monster movement code analysed (and may do a follow up post with details) - the only bit I haven’t quiet yet got to the demo mode player movement control routine.

It’s always informative looking at other peoples code. You learn things not only about how the program works, but what the design compromises were and even get to guess at how the design process worked.

UPDATE 27Jan2012: Annotated disassembly posted on my page about Gulpman.

Newer›  ‹Older