2013-04-22

Rolling your own message/job queue in MySQL (Part 1)

Let's do an experiment.  Let's theorize on what we need to implement a job queue in MySQL.  I'll use this as a learning experience to discover more about message and job queue systems.  But first, an explanation as to why we're doing this in MySQL and not using some other, optimized software:
  • We can tightly couple our jobs to our data, allowing for better statistics
  • We can use all the existing tools we have at our disposal for SQL integration (ORMs, etc.)
  • We can use the raw flexibility of SQL itself
  • It will allow me to prototype these systems quickly without having to worry about the underlying data structure (beyond SQL schema) and transfer protocol (although we will need to design our API).
At the same time, there's some distinct disadvantages.  Some of which are:
  • Without special care, future usage and data growth may impair function.  Special care needs to be paid to future use.
  • The storage medium is not optimized as a queue, so it will never be quite as fast as a dedicated solution.
  • Too much explicit linking to data in the specification may hamper general purpose use, while too little will negate the benefits of using the same system as the data.
It's a fine line between taking too much advantage of the data in the system, and not enough.  It probably changes as time progresses as well, making it even harder.  I think the best way to deal with this particular complexity is to plan for one table per job type or category.  This allows you to link to the underlying data for that job appropriately without imposing on other job types.  As a fallback, a general queue with an indexed type field and arbitrary payload (serialized JSON, perhaps) could stand in for general jobs/messages until that type graduated to needing it's own table through needed data coupling.

With these requirements in place, we can now address the implementation of a general purpose job/message queue in a MySQL.  To help us make a general purpose implementation that isn't unintentionally influenced too much by our own data and needs (let's assume we can't perfectly model all our future needs), take note of some of the features of leading message and job queue software.

The AMQP protocol section on messaging capabilities has a few items of interest:
  1. some standard outcomes for transfers, through which receivers of messages can for example accept or reject messages
  2. a mechanism for indicating or requesting one of the two basic distribution patterns, competing- and non-competing- consumers, through the distribution modes move and copy respectively
  3. the ability to create nodes on-demand, e.g. for temporary response queues
  4. the ability to refine the set of message of interest to a receiver through filters
All of these seem like sane capabilities we would want to have.  The fourth item is already inherently available by our choice of SQL, so that's a bonus!  Now let's look at a job queue implementation, and see what it throws into the mix.  For this, let's look at Gearman.  It's a relatively popular job queue implementation, so by looking at a client implementation we should get a good idea what they view as requirements to implement a good job queue system.  For this I'm going to look at Perl's Gearman::Task.  I'll summarize the methods it supplies and their uses, from the perldoc page:
  • uniq - If this job exists in the queue, merge this request with that one, and notify both clients when finished
  • on_status - Call specified code when job changes status
  • on_{complete,retry,fail} - Call the specified code when the specific action/status reached
  • retry_count - Allow this many failures before giving up on job
  • timeout - set status to failed if specified time has elapsed without a success or fail status already
From this it becomes somewhat obvious that while there are similarities between message queues and job queues, they have different responsibilities.  Message queues concern themselves with correct delivery of messages to one or more consumers, with performance being a major constraint.  Additionally, message queues appear to be mainly unidirectional.  You put a message in the queue, and it's consumed, but the queue does not concern itself with the content of the message or eventual status.  Job queues concern themselves more with the eventual state of the message, or job.  They track what should happen on success or failure, and provide mechanisms to track that (asynchronously) at the submission level.  In fact, a lot of job queues are implemented on top off message queues, as some of the implicit requirements are similar enough that they can be reused or built upon easily.

So, what have we learned so far?  We've learned some of the core differences between message queues and job queues, and that their individual uses are different enough one solution encompassing both will probably serve neither need well.

So, where does that leave us?  Well, I think we need to build a message queue, which is the simpler of the two, and then we can look at a job queue.  With a little luck, we'll be able to take advantage of the medium to not just build a job queue on top of a message queue, but alter some of our behavior subtly to achieve the same result.  I can imagine a post for each initial implementation, plus some testing against existing systems at the end, so it looks like we may be in for at least four parts.


No comments:

Post a Comment