Apparently some people don't believe that python's threading, while easier, is necessarily slower than a system that uses OS threads. This illustrates a fundamental misunderstanding of how OS threads work compared to "green" threads, also called coroutines. Python's threading model doesn't seem to quite match green threads exactly, but because of the global interpreter lock, they have a lot of the same performance problems, so I'll treat this test as an indication of the coroutine threading model's performance as well.
The problem with green threads is that they don't scale with the hardware. Sure, they scale beautifully in the number of processes you can spawn and control, but just because it shows two concurrent processes, doesn't mean you are actually utilizing the hardware effectively. Python's global interpreter lock ensures that two python processes can't be running at the same time. This resolves any possible variable locking problems. It also means, only one CPU core is being utilized, which has clear performance implications.
To illustrate this, I've created two simple test programs, one in python, one in Perl. They each split into a specifiable number of threads, and each thread performs some repetitive math operations 10,000,000 times and reports back how long it took to a centralized data structure. Finally, it prints out the sum of the times the threads took to perform the task, and the time the main process actually took spawn and collect them, which is the real processing time. Here they are:
Perl:
use strict;use warnings;use threads;use threads::shared;use Time::HiRes qw/time/;my @thread_times :shared;sub workfunc {my $start = time();print "Thread ".threads::tid()." :: started at ".localtime()."\n";my $init = shift;for (1 .. 10_000_000) {$init += (($_%4)*($_%4)) / ($_%10+1);}my $runtime = time() - $start;{ lock(@thread_times); push(@thread_times, $runtime); }print "Thread ".threads::tid()." :: stopped at ".localtime()."\n";return $runtime;}# Setupmy $num_threads = shift || 2;print "Running with $num_threads threads\n";# Main codemy $master_start = time();my @threads = map { threads->create(\&workfunc, rand(10)); } 1 .. $num_threads;my $thread_time = 0;for my $t (@threads) {$thread_time += $t->join();}my $master_time = time() - $master_start;# Find and print timesmy $lock_time = 0;{ lock(@thread_times); $lock_time += $_ for @thread_times; }print "Master logged $master_time runtime\n";print "Threads reported $lock_time combined runtime\n";
Python:
import mathimport randomimport timeimport sysimport threadingthread_times = []def workfunc(id,init):l = threading.local()l.start = time.clock()print "Thread "+str(id)+" :: started at "+time.asctime()for i in xrange (1, 10000000):init += ((i%4)*(i%4)) / (i%10+1)l.runtime = time.clock() - l.startthread_times.append(l.runtime)print "Thread "+str(id)+" :: stopped at "+time.asctime()return l.runtime# Setupnum_threads = 2if len(sys.argv)>1:num_threads = int(sys.argv[1])print "Running with "+str(num_threads)+" threads"# Main codemaster_start = time.clock()threads = []for i in xrange (1, num_threads+1):t = threading.Thread(target=workfunc,args=(i,random.random()*10))t.start()threads.append(t)for t in threads:t.join()master_time = time.clock() - master_start# Find and print timesthread_time = 0for t in thread_times:thread_time += tprint "Master logged "+str(master_time)+" runtime";print "Threads reported "+str(thread_time)+" combined runtime";
The results are very telling, and here they are:
Perl:
$ perl thread.pl 1Running with 1 threadsThread 1 :: started at Wed Jul 21 12:47:26 2010Thread 1 :: stopped at Wed Jul 21 12:47:31 2010Master logged 5.21124505996704 runtimeThreads reported 5.20653009414673 combined runtime$ perl thread.pl 2Running with 2 threadsThread 1 :: started at Wed Jul 21 12:47:36 2010Thread 2 :: started at Wed Jul 21 12:47:36 2010Thread 2 :: stopped at Wed Jul 21 12:47:42 2010Thread 1 :: stopped at Wed Jul 21 12:47:42 2010Master logged 5.42760705947876 runtimeThreads reported 10.6412711143494 combined runtime$ perl thread.pl 3Running with 3 threadsThread 1 :: started at Wed Jul 21 12:47:47 2010Thread 2 :: started at Wed Jul 21 12:47:47 2010Thread 3 :: started at Wed Jul 21 12:47:47 2010Thread 2 :: stopped at Wed Jul 21 12:47:54 2010Thread 1 :: stopped at Wed Jul 21 12:47:54 2010Thread 3 :: stopped at Wed Jul 21 12:47:56 2010Master logged 8.67003512382507 runtimeThreads reported 22.7089991569519 combined runtime$ perl thread.pl 4Running with 4 threadsThread 1 :: started at Wed Jul 21 12:47:59 2010Thread 2 :: started at Wed Jul 21 12:47:59 2010Thread 3 :: started at Wed Jul 21 12:47:59 2010Thread 4 :: started at Wed Jul 21 12:47:59 2010Thread 1 :: stopped at Wed Jul 21 12:48:09 2010Thread 3 :: stopped at Wed Jul 21 12:48:09 2010Thread 2 :: stopped at Wed Jul 21 12:48:10 2010Thread 4 :: stopped at Wed Jul 21 12:48:10 2010Master logged 10.9765570163727 runtimeThreads reported 43.3241600990295 combined runtime
Note that a single thread takes approximately 5.2 seconds. Two threads takes only a few fractions of a second more. That's because each thread ran on it's own core. The threads report they had a combined run time of 10+ seconds, as each had a 5+ second run time, but it was *in parallel*, so the real processing time was ~5.2 seconds. My system is a dual core system, so we finally see some slowdown at three threads, where it takes 50% longer. Four threads takes twice as long as two threads, as expected.
Now let's look at Python:
$ python thread.py 1Running with 1 threadsThread 1 :: started at Wed Jul 21 12:53:03 2010Thread 1 :: stopped at Wed Jul 21 12:53:08 2010Master logged 5.25 runtimeThreads reported 5.25 combined runtime$ python thread.py 2Running with 2 threadsThread 1 :: started at Wed Jul 21 12:53:10 2010Thread 2 :: started at Wed Jul 21 12:53:10 2010Thread 2 :: stopped at Wed Jul 21 12:53:22 2010Thread 1 :: stopped at Wed Jul 21 12:53:22 2010Master logged 14.66 runtimeThreads reported 29.09 combined runtime$ python thread.py 3Running with 3 threadsThread 1 :: started at Wed Jul 21 12:53:25 2010Thread 2 :: started at Wed Jul 21 12:53:25 2010Thread 3 :: started at Wed Jul 21 12:53:25 2010Thread 3 :: stopped at Wed Jul 21 12:53:43 2010Thread 1 :: stopped at Wed Jul 21 12:53:44 2010Thread 2 :: stopped at Wed Jul 21 12:53:44 2010Master logged 22.48 runtimeThreads reported 66.26 combined runtime$ python thread.py 4Running with 4 threadsThread 1 :: started at Wed Jul 21 12:53:46 2010Thread 2 :: started at Wed Jul 21 12:53:46 2010Thread 3 :: started at Wed Jul 21 12:53:46 2010Thread 4 :: started at Wed Jul 21 12:53:46 2010Thread 2 :: stopped at Wed Jul 21 12:54:12 2010Thread 1 :: stopped at Wed Jul 21 12:54:12 2010Thread 3 :: stopped at Wed Jul 21 12:54:13 2010Thread 4 :: stopped at Wed Jul 21 12:54:13 2010Master logged 30.16 runtimeThreads reported 120.4 combined runtime
Here we see where the global interpreter lock causing us problems. Each additional thread causes it to take the same amount more time. That's because only a single thread can run at any time, yielding no time savings for running four threads. It's equivalent to running them in sequence, in this case.
Now, it's only fair to note, this isn't always the case. We are seeing this behavior because this is a CPU bound problem. If the workload for each thread were IO bound, we would probably see much better performance from python than we are in this case. Green threads have the ability to much more closely and easily control the separate threads, which yields good results when you don't actually NEED multiple CPU cores.
In summary, Green threads, coroutines, and python's implementation of threads have clear disadvantages under certain workloads, to the point of having negligible to no performance benefits in some cases. If you are using a language that only offers one of these methods of threading and you have a process model that lends itself well to threading, I feel for you.
Update: I figured I should post the python and perl versions before I'm asked/accused.
$ perl -v
This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi
$ python -V
Python 2.6.2
Both are the standard RPMs included with the RHEL6 beta 2 install.
$ rpm -q perl
perl-5.10.1-109.el6.x86_64
$ rpm -q python
python-2.6.2-7.el6.x86_64