Archive for July 23rd, 2010

Asynchronous data structures with Twisted Deferreds

Friday, July 23rd, 2010

Earlier this week I gave a talk titled Deferred Gratification (slides) at EuroPython in Birmingham. The title was supposed to hint at how much I love Twisted‘s Deferred class, and that it took me some time to really appreciate it.

While thinking about Deferreds in the days before the talk, I think I put my finger on one of the reasons I find Deferreds so attractive and elegant. The thought can be neatly summarized: Deferreds let you replace synchronous data structures with elegant asynchronous ones.

There are various ways that people get exposed to thinking about programming in an asynchronous style. The great majority do so via building user interfaces, usually with widget libraries wherein one specifies widgets and layouts, sets up event handlers, and then surrenders control to a main loop that thereafter routes events to handlers. Exposure to asynchronous programming via building UIs is becoming much more common as Javascript programmers build client-side web apps that operate asynchronously with web servers (and the “A” in AJAX of course stands for asynchronous). Others become aware of asynchronous programming via writing network code (perhaps using Twisted). Relatively few become aware of asynchronous programming via doing async filesystem I/O.

Because Twisted’s Deferreds don’t have anything to do with UIs or networking or filesystems, you can use them to implement other asynchronous things, like an asynchronous data structure. To show you what I mean, here’s a slightly simplified version of Twisted’s DeferredQueue class, taken from twisted/internet/defer.py:

class DeferredQueue(object):

    def __init__(self):
        # It would be better if these both used collections.deque (see comments section below).
        self.waiting = [] # Deferreds that are expecting an object
        self.pending = [] # Objects queued to go out via get.

    def put(self, obj):
        if self.waiting:
            self.waiting.pop(0).callback(obj)
        else:
            self.pending.append(obj)

    def get(self):
        if self.pending:
            return succeed(self.pending.pop(0))
        else:
            d = Deferred()
            self.waiting.append(d)
            return d
 

I find this extremely elegant, and I’m going to explain why.

But first, think about code for a regular synchronous queue. What happens if you call get on a regular queue that’s empty? Almost certainly one of two things: you’ll get some kind of QueueEmpty error, or else your code will block until some other code puts something into the queue. I.e., you either get a synchronous error or you get a synchronous non-empty response.

If you look at the get method in the code above, you’ll see that if the queue is empty (i.e., the pending list is empty), a new Deferred is made, added to self.waiting, and is immediately returned to the caller. So code calling get on an empty queue doesn’t get an error and doesn’t block, it always gets a result back essentially immediately. How can you get a result from an empty queue? Easy: the result is a Deferred. And because we’re in the asynchronous world, you just attach callbacks (like event handlers in the UI world) to the Deferred, and go on your merry way.

If you can follow that thinking, the rest of the code in the class above should be easy to grok. In put, if there are any outstanding Deferreds (i.e., earlier callers who hit an empty queue and got a Deferred back), the incoming object is given to the first of these by passing it to the callback function of the Deferred (and popping it out of the waiting list). If there are no outstanding Deferreds expecting a result, the incoming object is simply appended to self.pending. On the get side, if the queue (i.e., self.pending) is non-empty, the code creates a Deferred that has already been fired (using the succeed helper function) and returns it.

By now, this kind of code seems routine and straightforward to me. But it certainly wasn’t always that way. So if the above seems cryptic or abstract, I encourage you to think it over, maybe write some code, ask questions below, etc. To my mind, these kinds of constructs – simple, elegant, robust, practical, obviously(?) bug-free, single-threaded, etc. – are extremely instructive. You too can solve potentially gnarly (at least in the threaded world) networking coding challenges in very simple ways just by building code out of simple Twisted Deferreds (there’s nothing special about the DeferredQueue class, it’s just built using primitive Deferreds).

For extra points, think about the concept of asynchronous data structures. I find that concept very attractive, and putting my finger on it helped me to see part of the reason why I think Deferreds are so great. (In fact, as you can see from the use of succeed, Deferreds are not specifically asynchronous or synchronous: they’re so simple they don’t even care, and as a consumer of Deferreds, your code doesn’t have to either). That’s all remarkably elegant.

I sometimes hope O’Reilly will publish a second volume of Beautiful Code and ask me to write a chapter. I know exactly what I’d write about :-)