2010-09-24

Threading: Perl vs Python (revisited)

I got a response to my threading post! Too bad I was too busy elsewhere to monitor a respond in a timely fashion.

It's a good post, and it builds on what I started to do in my post, which was to show some differences in threading through benchmarks and highlight some (ok, one) cases where you need to be aware of what your chosen model gets you. Juster seems to be looking to do an actual comparison of or the models, and goes as far as finding better matching models within the languages to compare.

Some of his complaints about my methodology are that the benchmarks are only there to serve a point (apparently to make Perl enthusiasts happy), that the benchmarks are CPU bound, and that my code doesn't make use of threads as it should. This all makes sense, if you assume I was trying to compare Perl's concurrent processing to Python's, which I was not. Apparently I wasn't clear, so I'll attempt to rectify that now.

My post was meant only to illustrate the differences in truly concurrent processing through Os threads and Python's chosen threading model, which is "green" threads. Please view my assertions and examples in that light.

Pandering to an audience - Apparently, JAPHs should be applauding me for bashing perennial foe Python in it's thread-hole. Unfortunately I didn't set out to elevate Perl in any way, nor to disparage Python, just to discuss threading models and benefits and detriments for some programming needs for each. I would think anyone with even a passing interest would see that as obvious, but apparently I was wrong. I guess you can't have Perl and Python together in a post with different results without implying that one is better than the other, no matter your original intention or assertions that it's a special case.

It's unfortunate that I chose to use Perl and Python. I could just as easily have use Perl with the Coro module, or Python threads and Python's multiple processes, capabilities. It would have been just as illustrative for my point. Alas, my mind was stuck in the original context of this, which was Perl and Python, and it seemed natural for me to just use the default threading of each. It WAS on my mind that Perl's threading isn't really true threading, but I figured that mattered little to my point.

CPU bound - Another critique is that my benchmarks are CPU bound. They are. By design. How else should I show that one technique doesn't help with CPU bound problems while the other does?

And note this was in my post:
Now, it's only fair to note, this isn't always the case. We are seeing this behavior because this is a CPU bound problem. If the workload for each thread were IO bound, we would probably see much better performance from python than we are in this case. Green threads have the ability to much more closely and easily control the separate threads, which yields good results when you don't actually NEED multiple CPU cores.

Flawed code - As stated, it was specifically designed to take advantage of multiple CPU cores, not exhibit a real world problem in all it's complexity. That said, it does map to some problems fairly well. I do share data within the threads, but only after the main processing has been completed. This is a common worker technique, where you spawn threads, hand them work to compute, and collect it afterwords. Some sorting techniques may make use of this, as well as Perl 6's auto-threading. Regardless of all that, it does illustrate some patterns don't lend themselves well to that model.


All said and done, I think Juster really missed the point of my post. I think he ran with what he got out of it though, and did a fairly thorough benchmarking of Perl vs. Python in more comparable terms. I think that should be applauded, so his post isn't without merit. I just wish he interpreted my post as what I was really trying to say. Though, from his post I figure he understands it well enough.

Edit: It just occurred to me why Juster interpreted it the way he did. As a boneheaded move, I left the subject from when I first started the article, and it says Perl vs. Python. With that coloring the reading of the article, it's only fair to assume I was actually comparing them. Mea culpa.

As a testament to just how oblivious I was to this, I also missed it when copy-pasting the subject into this new article. Duh.

2010-07-21

Threading: Perl vs Python


Apparently some people don't believe that python's threading, while easier, is necessarily slower than a system that uses OS threads. This illustrates a fundamental misunderstanding of how OS threads work compared to "green" threads, also called coroutines. Python's threading model doesn't seem to quite match green threads exactly, but because of the global interpreter lock, they have a lot of the same performance problems, so I'll treat this test as an indication of the coroutine threading model's performance as well.

The problem with green threads is that they don't scale with the hardware. Sure, they scale beautifully in the number of processes you can spawn and control, but just because it shows two concurrent processes, doesn't mean you are actually utilizing the hardware effectively. Python's global interpreter lock ensures that two python processes can't be running at the same time. This resolves any possible variable locking problems. It also means, only one CPU core is being utilized, which has clear performance implications.

To illustrate this, I've created two simple test programs, one in python, one in Perl. They each split into a specifiable number of threads, and each thread performs some repetitive math operations 10,000,000 times and reports back how long it took to a centralized data structure. Finally, it prints out the sum of the times the threads took to perform the task, and the time the main process actually took spawn and collect them, which is the real processing time. Here they are:

Perl:
use strict;
use warnings;
use threads;
use threads::shared;
use Time::HiRes qw/time/;
my @thread_times :shared;

sub workfunc {
my $start = time();
print "Thread ".threads::tid()." :: started at ".localtime()."\n";
my $init = shift;
for (1 .. 10_000_000) {
$init += (($_%4)*($_%4)) / ($_%10+1);
}
my $runtime = time() - $start;
{ lock(@thread_times); push(@thread_times, $runtime); }
print "Thread ".threads::tid()." :: stopped at ".localtime()."\n";
return $runtime;
}

# Setup
my $num_threads = shift || 2;
print "Running with $num_threads threads\n";

# Main code
my $master_start = time();
my @threads = map { threads->create(\&workfunc, rand(10)); } 1 .. $num_threads;
my $thread_time = 0;
for my $t (@threads) {
$thread_time += $t->join();
}
my $master_time = time() - $master_start;

# Find and print times
my $lock_time = 0;
{ lock(@thread_times); $lock_time += $_ for @thread_times; }
print "Master logged $master_time runtime\n";
print "Threads reported $lock_time combined runtime\n";


Python:
import math
import random
import time
import sys
import threading
thread_times = []

def workfunc(id,init):
l = threading.local()
l.start = time.clock()
print "Thread "+str(id)+" :: started at "+time.asctime()
for i in xrange (1, 10000000):
init += ((i%4)*(i%4)) / (i%10+1)
l.runtime = time.clock() - l.start
thread_times.append(l.runtime)
print "Thread "+str(id)+" :: stopped at "+time.asctime()
return l.runtime

# Setup
num_threads = 2
if len(sys.argv)>1:
num_threads = int(sys.argv[1])
print "Running with "+str(num_threads)+" threads"

# Main code
master_start = time.clock()
threads = []
for i in xrange (1, num_threads+1):
t = threading.Thread(target=workfunc,args=(i,random.random()*10))
t.start()
threads.append(t)
for t in threads:
t.join()
master_time = time.clock() - master_start

# Find and print times
thread_time = 0
for t in thread_times:
thread_time += t
print "Master logged "+str(master_time)+" runtime";
print "Threads reported "+str(thread_time)+" combined runtime";
The results are very telling, and here they are:

Perl:
$ perl thread.pl 1
Running with 1 threads
Thread 1 :: started at Wed Jul 21 12:47:26 2010
Thread 1 :: stopped at Wed Jul 21 12:47:31 2010
Master logged 5.21124505996704 runtime
Threads reported 5.20653009414673 combined runtime
$ perl thread.pl 2
Running with 2 threads
Thread 1 :: started at Wed Jul 21 12:47:36 2010
Thread 2 :: started at Wed Jul 21 12:47:36 2010
Thread 2 :: stopped at Wed Jul 21 12:47:42 2010
Thread 1 :: stopped at Wed Jul 21 12:47:42 2010
Master logged 5.42760705947876 runtime
Threads reported 10.6412711143494 combined runtime
$ perl thread.pl 3
Running with 3 threads
Thread 1 :: started at Wed Jul 21 12:47:47 2010
Thread 2 :: started at Wed Jul 21 12:47:47 2010
Thread 3 :: started at Wed Jul 21 12:47:47 2010
Thread 2 :: stopped at Wed Jul 21 12:47:54 2010
Thread 1 :: stopped at Wed Jul 21 12:47:54 2010
Thread 3 :: stopped at Wed Jul 21 12:47:56 2010
Master logged 8.67003512382507 runtime
Threads reported 22.7089991569519 combined runtime
$ perl thread.pl 4
Running with 4 threads
Thread 1 :: started at Wed Jul 21 12:47:59 2010
Thread 2 :: started at Wed Jul 21 12:47:59 2010
Thread 3 :: started at Wed Jul 21 12:47:59 2010
Thread 4 :: started at Wed Jul 21 12:47:59 2010
Thread 1 :: stopped at Wed Jul 21 12:48:09 2010
Thread 3 :: stopped at Wed Jul 21 12:48:09 2010
Thread 2 :: stopped at Wed Jul 21 12:48:10 2010
Thread 4 :: stopped at Wed Jul 21 12:48:10 2010
Master logged 10.9765570163727 runtime
Threads reported 43.3241600990295 combined runtime
Note that a single thread takes approximately 5.2 seconds. Two threads takes only a few fractions of a second more. That's because each thread ran on it's own core. The threads report they had a combined run time of 10+ seconds, as each had a 5+ second run time, but it was *in parallel*, so the real processing time was ~5.2 seconds. My system is a dual core system, so we finally see some slowdown at three threads, where it takes 50% longer. Four threads takes twice as long as two threads, as expected.

Now let's look at Python:
$ python thread.py 1
Running with 1 threads
Thread 1 :: started at Wed Jul 21 12:53:03 2010
Thread 1 :: stopped at Wed Jul 21 12:53:08 2010
Master logged 5.25 runtime
Threads reported 5.25 combined runtime
$ python thread.py 2
Running with 2 threads
Thread 1 :: started at Wed Jul 21 12:53:10 2010
Thread 2 :: started at Wed Jul 21 12:53:10 2010
Thread 2 :: stopped at Wed Jul 21 12:53:22 2010
Thread 1 :: stopped at Wed Jul 21 12:53:22 2010
Master logged 14.66 runtime
Threads reported 29.09 combined runtime
$ python thread.py 3
Running with 3 threads
Thread 1 :: started at Wed Jul 21 12:53:25 2010
Thread 2 :: started at Wed Jul 21 12:53:25 2010
Thread 3 :: started at Wed Jul 21 12:53:25 2010
Thread 3 :: stopped at Wed Jul 21 12:53:43 2010
Thread 1 :: stopped at Wed Jul 21 12:53:44 2010
Thread 2 :: stopped at Wed Jul 21 12:53:44 2010
Master logged 22.48 runtime
Threads reported 66.26 combined runtime
$ python thread.py 4
Running with 4 threads
Thread 1 :: started at Wed Jul 21 12:53:46 2010
Thread 2 :: started at Wed Jul 21 12:53:46 2010
Thread 3 :: started at Wed Jul 21 12:53:46 2010
Thread 4 :: started at Wed Jul 21 12:53:46 2010
Thread 2 :: stopped at Wed Jul 21 12:54:12 2010
Thread 1 :: stopped at Wed Jul 21 12:54:12 2010
Thread 3 :: stopped at Wed Jul 21 12:54:13 2010
Thread 4 :: stopped at Wed Jul 21 12:54:13 2010
Master logged 30.16 runtime
Threads reported 120.4 combined runtime
Here we see where the global interpreter lock causing us problems. Each additional thread causes it to take the same amount more time. That's because only a single thread can run at any time, yielding no time savings for running four threads. It's equivalent to running them in sequence, in this case.

Now, it's only fair to note, this isn't always the case. We are seeing this behavior because this is a CPU bound problem. If the workload for each thread were IO bound, we would probably see much better performance from python than we are in this case. Green threads have the ability to much more closely and easily control the separate threads, which yields good results when you don't actually NEED multiple CPU cores.

In summary, Green threads, coroutines, and python's implementation of threads have clear disadvantages under certain workloads, to the point of having negligible to no performance benefits in some cases. If you are using a language that only offers one of these methods of threading and you have a process model that lends itself well to threading, I feel for you.

Update: I figured I should post the python and perl versions before I'm asked/accused.
$ perl -v
This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi
$ python -V
Python 2.6.2

Both are the standard RPMs included with the RHEL6 beta 2 install.

$ rpm -q perl
perl-5.10.1-109.el6.x86_64
$ rpm -q python
python-2.6.2-7.el6.x86_64

2010-07-19

Perl promotion

Why is making money from a community a bad thing?

Gabor's been taking a lot of criticism for attempting to promote Perl, and proposing it so he gets paid for it. Apparently, many people seem to be of the opinion that getting paid to do community work is trying to pull a fast one over on the community.

It's interesting to see the backlash from people when they think someone is going to somehow profit from a community they are part of. I see nothing fundamentally opposing someone making a profit from the Perl community (not that I think that's Gabor's goal). The only metric that matters here should be whether the gains outweigh the costs.

Personally, I don't care if Gabor is paid $1, $10,000, or $1,000,000 dollars. If the gains match or exceed the cost, it was worth it. Quantification can be hard to determine, but I find it hard to believe we can't agree that the attempt is worthwhile, and we can determine whether it was successful in any part at a later date.

As a thought exercise, let's examine this from the opposite direction. Assume the community or a part thereof has the option to pay $1,000,000 (which let's assume it has) to an individual which guarantees Perl to become the #1 most popular programming language within the next year. Is it worth it? Is it even desirable? If the answer to those is yes, you should be in fundamental agreement with Gabor's proposal, and only the implementation details are to be worked out. And if that's the case, if not Gabor, then who else?