Saturday, July 21, 2012

Raspberry Pi Python Timings

Someone interestingly produced some Python output tests (here) which showed a problem with the I/O speed on Python at the moment on the Raspberry Pi - maybe due to the technique of open/closed files for userland I/O (unconfirmed guess based on this).

When discussing this, my good friend Stu asked me to time how long it took Python to print out some numbers on a Raspberry Pi. Well, obviously that was way too simple. I came up with 7 very silly tests and used the Python module ‘timeit’, which tries to avoid some common problems when timing code. Although, as can be seen from the two runs on Raspberry Pi on SSH, these timings are interestingly inconsistent.

Lastly a warning about the results below: There are a lot of variables (memory, python versions, exactly what background tasks there are, the immaturity of Raspbian (the Debian/Linux operating system distribution)). Also the Python tests are very stupid and overly simple, and, as with all benchmarks, don’t represent real application in the slightest. Therefore I’d warn against taking these seriously at all. Also lots of people have already done real applications. I’d like to time these on an equivalently priced 8-bit (or even 32-bit) single-chip Microcontroller board. Oh, that’s right, Python wouldn’t even run....

Some Basic Analysis of the Results

The ‘normal PC’-type (expensive) computer is faster; not surprising to me.

The Dreamplug is slower at floating point (it doesn’t have hardware floating point) , but faster at the list and empty function tests - probably because the 70% increase in speed directly relates to clock speed and the ARMv5 to ARMv6 extensions don’t make up for this basic difference.

The printing tests are very interesting, and I haven’t really had enough time to consider why some of these are so strange, especially the Dreamplug vs. Raspberry Pi on the SSH tests (both are wired via 100 Mbit/s Ethernet into a WiFi router).

One thing is clear: printing and probably specifically scrolling the screen up one line on a real FullHD (1920x1080) screen is SLOW on the Raspberry Pi because it doesn’t have any hardware acceleration enabled - pushing two million pixels (maybe 6 or 8 millions bytes) to print EACH number is a lot of work. (2.3ms work, by the look of it, or 1.6 million clock cycles if my calculations are correct).

Future Tests

Print to the same line on the print tests. (just on screen on Raspberry Pi) - either just without newlines or by resetting the cursor back to the line start, therefore avoiding ALL scrolling.

Run the same tests in another scripting language and in C.


Results

2012-07-15-wheezy-raspbian Raspberry Pi ARM1176JZF ARMv6 700MHz
SDRAM: 256 MB (shared with GPU)
Python 2.7.3rc2 via SSH terminal session (Wired Ethernet)
Text Shell only, no other apps running

RUN1:
while_print = 11724 microseconds/function ( total = 117.24 s)
for_range_print = 11643.1 microseconds/function ( total = 116.431 s)
for_xrange_print = 11633.1 microseconds/function ( total = 116.331 s)
list_append = 365.965 microseconds/function ( total = 36.5965 s)
empty_func = 3.13354 microseconds/function ( total = 3.13354 s)
sys_sleep_func = 100184 microseconds/function ( total = 10.0184 s)

RUN2 (with float test added):
while_print = 11506.4 microseconds/function ( total = 115.064 s)
for_range_print = 11401.8 microseconds/function ( total = 114.018 s)
for_xrange_print = 11381.6 microseconds/function ( total = 113.816 s)
list_append = 313.054 microseconds/function ( total = 31.3054 s)
empty_func = 2.17917 microseconds/function ( total = 2.17917 s)
sys_sleep_func = 100188 microseconds/function ( total = 10.0188 s)
float_func = 10.9426 microseconds/function ( total = 10.9426 s)


2012-07-15-wheezy-raspbian Raspberry Pi ARM1176JZF ARMv6 700MHz
SDRAM: 256 MB (shared with GPU)
Python 2.7.3rc2 terminal on HDMI with Full HD frame buffer (no acceleration)
Text Shell only, no other apps running

while_print = 70540.2 microseconds/function ( total = 705.402 s)
for_range_print = 70276.7 microseconds/function ( total = 702.767 s)
for_xrange_print = 72477.2 microseconds/function ( total = 724.772 s)
list_append = 424.997 microseconds/function ( total = 42.4997 s)
empty_func = 3.98467 microseconds/function ( total = 3.98467 s)
sys_sleep_func = 100213 microseconds/function ( total = 10.0213 s)
float_func = 12.9488 microseconds/function ( total = 12.9488 s)


2012-07-15-wheezy-raspbian Raspberry Pi ARM1176JZF ARMv6 700MHz
SDRAM: 256 MB (shared with GPU)
Python 2.7.3rc2 terminal on HDMI with Full HD frame buffer (no acceleration) - print to file (on a slow 2GB SD card)
Text Shell only, no other apps running

python time_test.py > results.txt
tail results.txt

while_print = 632.781 microseconds/function ( total = 6.32781 s)
for_range_print = 670.373 microseconds/function ( total = 6.70373 s)
for_xrange_print = 647.219 microseconds/function ( total = 6.47219 s)
list_append = 465.027 microseconds/function ( total = 46.5027 s)
empty_func = 3.63238 microseconds/function ( total = 3.63238 s)
sys_sleep_func = 100196 microseconds/function ( total = 10.0196 s)
float_func = 15.9253 microseconds/function ( total = 15.9253 s)


Debian Lenny Dreamplug Marvell Sheeva ARMv5 (no FPU) 1.2 GHz
SDRAM: 512MB 800 MHz 16bit DDR2
Python 2.5.2 via SSH terminal session (Wired Ethernet)
Text Shell only, no other apps running

while_print = 954.053 microseconds/function ( total = 9.54053 s)
for_range_print = 935.227 microseconds/function ( total = 9.35227 s)
for_xrange_print = 932.759 microseconds/function ( total = 9.32759 s)
list_append = 193.037 microseconds/function ( total = 19.3037 s)
empty_func = 1.55294 microseconds/function ( total = 1.55294 s)
sys_sleep_func = 100203 microseconds/function ( total = 10.0203 s)
float_func = 17.3767 microseconds/function ( total = 17.3767 s)


Mac OS X 10.6.6 Core i7 (Dual with HT) 2.66GHz
SDRAM: 4 GB 1067 MHz DDR3
Python 2.6.6 in Terminal
6 apps running

while_print = 159.976 microseconds/function ( total = 1.59976 s)
for_range_print = 151.213 microseconds/function ( total = 1.51213 s)
for_xrange_print = 154.768 microseconds/function ( total = 1.54768 s)
list_append = 19.8449 microseconds/function ( total = 1.98449 s)
empty_func = 0.161463 microseconds/function ( total = 0.161463 s)
sys_sleep_func = 100101 microseconds/function ( total = 10.0101 s)
float_func = 0.347249 microseconds/function ( total = 0.347249 s)


Source Code

# Various really stupid Python benchmarking tests
# Rob Probin 2012

import time
import math

def while_print():
    i = 1
    while i <= 30:
        print i
        i+=1

def for_range_print():
    for i in range(30):
        print i

def for_xrange_print():
    for i in xrange(30):
        print i

def list_append():
    L = []
    for i in range(100):
        L.append(i)

def empty_func():
    pass
   
def sys_sleep_func():
    time.sleep(0.1)

def float_func():
    k = 1.5
    return math.sqrt(k*3.45)
   
test_suite = [
    ("while_print", 10000),
    ("for_range_print", 10000),
    ("for_xrange_print", 10000),
    ("list_append", 100000),
    ("empty_func", 1000000),
    ("sys_sleep_func", 100),
    ("float_func", 1000000),
]

if __name__ == '__main__':   
    result = []
    from timeit import Timer
   
    for test in test_suite:
        test_name, passes = test
       
        print("RUNNING TEST:", test_name)
        t = Timer(test_name+"()", "from __main__ import "+test_name)
        test_time = t.timeit(number=passes)
        test_results = (test_name, test_time, test_time/passes)
        result.append(test_results)

    print
    for result_set in result:
        (name, total, func_time) = result_set
        print "%s = %g microseconds/function ( total = %g s)" % (name, func_time*1000000, total)

0 Comments:

Post a Comment

<< Home

Newer›  ‹Older