2010-07-21

Threading: Perl vs Python


Apparently some people don't believe that python's threading, while easier, is necessarily slower than a system that uses OS threads. This illustrates a fundamental misunderstanding of how OS threads work compared to "green" threads, also called coroutines. Python's threading model doesn't seem to quite match green threads exactly, but because of the global interpreter lock, they have a lot of the same performance problems, so I'll treat this test as an indication of the coroutine threading model's performance as well.

The problem with green threads is that they don't scale with the hardware. Sure, they scale beautifully in the number of processes you can spawn and control, but just because it shows two concurrent processes, doesn't mean you are actually utilizing the hardware effectively. Python's global interpreter lock ensures that two python processes can't be running at the same time. This resolves any possible variable locking problems. It also means, only one CPU core is being utilized, which has clear performance implications.

To illustrate this, I've created two simple test programs, one in python, one in Perl. They each split into a specifiable number of threads, and each thread performs some repetitive math operations 10,000,000 times and reports back how long it took to a centralized data structure. Finally, it prints out the sum of the times the threads took to perform the task, and the time the main process actually took spawn and collect them, which is the real processing time. Here they are:

Perl:
use strict;
use warnings;
use threads;
use threads::shared;
use Time::HiRes qw/time/;
my @thread_times :shared;

sub workfunc {
my $start = time();
print "Thread ".threads::tid()." :: started at ".localtime()."\n";
my $init = shift;
for (1 .. 10_000_000) {
$init += (($_%4)*($_%4)) / ($_%10+1);
}
my $runtime = time() - $start;
{ lock(@thread_times); push(@thread_times, $runtime); }
print "Thread ".threads::tid()." :: stopped at ".localtime()."\n";
return $runtime;
}

# Setup
my $num_threads = shift || 2;
print "Running with $num_threads threads\n";

# Main code
my $master_start = time();
my @threads = map { threads->create(\&workfunc, rand(10)); } 1 .. $num_threads;
my $thread_time = 0;
for my $t (@threads) {
$thread_time += $t->join();
}
my $master_time = time() - $master_start;

# Find and print times
my $lock_time = 0;
{ lock(@thread_times); $lock_time += $_ for @thread_times; }
print "Master logged $master_time runtime\n";
print "Threads reported $lock_time combined runtime\n";


Python:
import math
import random
import time
import sys
import threading
thread_times = []

def workfunc(id,init):
l = threading.local()
l.start = time.clock()
print "Thread "+str(id)+" :: started at "+time.asctime()
for i in xrange (1, 10000000):
init += ((i%4)*(i%4)) / (i%10+1)
l.runtime = time.clock() - l.start
thread_times.append(l.runtime)
print "Thread "+str(id)+" :: stopped at "+time.asctime()
return l.runtime

# Setup
num_threads = 2
if len(sys.argv)>1:
num_threads = int(sys.argv[1])
print "Running with "+str(num_threads)+" threads"

# Main code
master_start = time.clock()
threads = []
for i in xrange (1, num_threads+1):
t = threading.Thread(target=workfunc,args=(i,random.random()*10))
t.start()
threads.append(t)
for t in threads:
t.join()
master_time = time.clock() - master_start

# Find and print times
thread_time = 0
for t in thread_times:
thread_time += t
print "Master logged "+str(master_time)+" runtime";
print "Threads reported "+str(thread_time)+" combined runtime";
The results are very telling, and here they are:

Perl:
$ perl thread.pl 1
Running with 1 threads
Thread 1 :: started at Wed Jul 21 12:47:26 2010
Thread 1 :: stopped at Wed Jul 21 12:47:31 2010
Master logged 5.21124505996704 runtime
Threads reported 5.20653009414673 combined runtime
$ perl thread.pl 2
Running with 2 threads
Thread 1 :: started at Wed Jul 21 12:47:36 2010
Thread 2 :: started at Wed Jul 21 12:47:36 2010
Thread 2 :: stopped at Wed Jul 21 12:47:42 2010
Thread 1 :: stopped at Wed Jul 21 12:47:42 2010
Master logged 5.42760705947876 runtime
Threads reported 10.6412711143494 combined runtime
$ perl thread.pl 3
Running with 3 threads
Thread 1 :: started at Wed Jul 21 12:47:47 2010
Thread 2 :: started at Wed Jul 21 12:47:47 2010
Thread 3 :: started at Wed Jul 21 12:47:47 2010
Thread 2 :: stopped at Wed Jul 21 12:47:54 2010
Thread 1 :: stopped at Wed Jul 21 12:47:54 2010
Thread 3 :: stopped at Wed Jul 21 12:47:56 2010
Master logged 8.67003512382507 runtime
Threads reported 22.7089991569519 combined runtime
$ perl thread.pl 4
Running with 4 threads
Thread 1 :: started at Wed Jul 21 12:47:59 2010
Thread 2 :: started at Wed Jul 21 12:47:59 2010
Thread 3 :: started at Wed Jul 21 12:47:59 2010
Thread 4 :: started at Wed Jul 21 12:47:59 2010
Thread 1 :: stopped at Wed Jul 21 12:48:09 2010
Thread 3 :: stopped at Wed Jul 21 12:48:09 2010
Thread 2 :: stopped at Wed Jul 21 12:48:10 2010
Thread 4 :: stopped at Wed Jul 21 12:48:10 2010
Master logged 10.9765570163727 runtime
Threads reported 43.3241600990295 combined runtime
Note that a single thread takes approximately 5.2 seconds. Two threads takes only a few fractions of a second more. That's because each thread ran on it's own core. The threads report they had a combined run time of 10+ seconds, as each had a 5+ second run time, but it was *in parallel*, so the real processing time was ~5.2 seconds. My system is a dual core system, so we finally see some slowdown at three threads, where it takes 50% longer. Four threads takes twice as long as two threads, as expected.

Now let's look at Python:
$ python thread.py 1
Running with 1 threads
Thread 1 :: started at Wed Jul 21 12:53:03 2010
Thread 1 :: stopped at Wed Jul 21 12:53:08 2010
Master logged 5.25 runtime
Threads reported 5.25 combined runtime
$ python thread.py 2
Running with 2 threads
Thread 1 :: started at Wed Jul 21 12:53:10 2010
Thread 2 :: started at Wed Jul 21 12:53:10 2010
Thread 2 :: stopped at Wed Jul 21 12:53:22 2010
Thread 1 :: stopped at Wed Jul 21 12:53:22 2010
Master logged 14.66 runtime
Threads reported 29.09 combined runtime
$ python thread.py 3
Running with 3 threads
Thread 1 :: started at Wed Jul 21 12:53:25 2010
Thread 2 :: started at Wed Jul 21 12:53:25 2010
Thread 3 :: started at Wed Jul 21 12:53:25 2010
Thread 3 :: stopped at Wed Jul 21 12:53:43 2010
Thread 1 :: stopped at Wed Jul 21 12:53:44 2010
Thread 2 :: stopped at Wed Jul 21 12:53:44 2010
Master logged 22.48 runtime
Threads reported 66.26 combined runtime
$ python thread.py 4
Running with 4 threads
Thread 1 :: started at Wed Jul 21 12:53:46 2010
Thread 2 :: started at Wed Jul 21 12:53:46 2010
Thread 3 :: started at Wed Jul 21 12:53:46 2010
Thread 4 :: started at Wed Jul 21 12:53:46 2010
Thread 2 :: stopped at Wed Jul 21 12:54:12 2010
Thread 1 :: stopped at Wed Jul 21 12:54:12 2010
Thread 3 :: stopped at Wed Jul 21 12:54:13 2010
Thread 4 :: stopped at Wed Jul 21 12:54:13 2010
Master logged 30.16 runtime
Threads reported 120.4 combined runtime
Here we see where the global interpreter lock causing us problems. Each additional thread causes it to take the same amount more time. That's because only a single thread can run at any time, yielding no time savings for running four threads. It's equivalent to running them in sequence, in this case.

Now, it's only fair to note, this isn't always the case. We are seeing this behavior because this is a CPU bound problem. If the workload for each thread were IO bound, we would probably see much better performance from python than we are in this case. Green threads have the ability to much more closely and easily control the separate threads, which yields good results when you don't actually NEED multiple CPU cores.

In summary, Green threads, coroutines, and python's implementation of threads have clear disadvantages under certain workloads, to the point of having negligible to no performance benefits in some cases. If you are using a language that only offers one of these methods of threading and you have a process model that lends itself well to threading, I feel for you.

Update: I figured I should post the python and perl versions before I'm asked/accused.
$ perl -v
This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi
$ python -V
Python 2.6.2

Both are the standard RPMs included with the RHEL6 beta 2 install.

$ rpm -q perl
perl-5.10.1-109.el6.x86_64
$ rpm -q python
python-2.6.2-7.el6.x86_64

1 comment:

  1. Sorry I am so late to respond. I was on vacation for several days. My responding comment started becoming very long so I instead created a blog and posted in response. The post is available here: http://xrl.us/bhunjn

    ReplyDelete