Archive for the ‘twisted’ Category

Learning jQuery Deferreds published

Tuesday, January 7th, 2014

learning-jQuery-deferredsNicholas Tollervey and I have written a book, Learning jQuery Deferreds, published by O’Reilly.

If you’ve been a reader of this blog over the years, you may have noticed that I’m very fond of deferreds (also known as promises or futures). I’ve mainly been using them in Python with Twisted, and a couple of years ago was happy to notice that jQuery also has a (somewhat different) version of deferreds. Asking around, it soon became clear that although there are tens of millions of programmers who’ve used jQuery, very few of them have ever used deferreds. E.g., at the jQuery conference in San Francisco in 2012, only about 25% of the audience in a talk on deferreds. There are about 19,000 results for “deferred” on StackOverflow.

This seemed like a perfect storm: a fantastically cool and empowering technology that I love thinking and writing about, built in to a ubiquitous web library, in use by millions of programmers, and yet somehow not widely known or used.

The book tries to really teach deferreds. There are 18 real-life examples, along with 75 challenges (and solutions). We’ve tried to get readers into deferreds the way we are, to be provocative, to try and get you thinking and scratching your head. To get you to see how cool and elegant working with deferreds can be. In short, to make you one of us.

Although the book focusses on jQuery’s flavor of deferreds, we wrote it with a much broader aim: to be just as useful to people working with any flavor of deferreds in any language. The concepts behind deferreds are few and simple. Even if you’re not a jQuery user or a JavaScript programmer, we hope you’ll enjoy and benefit from the book.

If you’re curious, the animal on the cover is a musky rat-kangaroo. O’Reilly chose it for us. When I first saw it, I mailed them and Nicholas to say it looked “overfed, passive, and thoughtful” to which Nicholas replied that it resembled me. The O’Reilly toolchain is modern and fun, employing AsciiDoc and a shared Git repo. We wrote 30,817 words and 2,301 lines of Javascript. There’s a source code repo for all the book examples, at https://github.com/jquery-deferreds/code. We spent six months on the book, during which I usually spent 1-2 days a week on it. It was a blast to write.

If you’d like to buy a copy, use AUTHD as a discount code on the O’Reilly site and you’ll receive 40% off the print or 50% off the e-book. Please let me know what you think, and/or leave a review on the O’Reilly (or Amazon) site. Have fun, and thanks!

txdlo – a Twisted deferred list observer

Monday, December 30th, 2013

Last night I wrote txdlo, a Python package that provides a class called DeferredListObserver. As you might guess, it lets you observe callback and errback events from a list of Twisted deferreds. You can add observers that will be passed information about deferreds firing. You can add deferreds to the observed list at any time, which is very useful if you’re dynamically creating deferreds that you want to monitor.

The class can be used to easily build functions or classes that provide deferreds that fire when arbitrary combinations of events from the observed deferreds have occurred.

For example you can write functions or classes that support deferreds that:

  • Implement Twisted’s DeferredList or simple variants of it, or that let you separate the various behaviors of DeferredList into simpler functions.
  • Provide a deferred that fires when N of the observed deferreds have fired.
  • Provide a deferred that ignores errors until one of the observed deferred succeeds, only firing with an error if all the observed deferreds fail.
  • Or (a more involved example), suppose you have 3 methods that can return you a user’s avatar: a fast local cache, a filesystem, and a slow network call to Gravatar. You want to write a deferred-returning function that launches all three lookups at once and fires its deferred with the first answer. But if the cache and/or filesystems fails first, you don’t want to trigger an error, you instead want to take the result from Gravatar and add it to the cache and/or filesystem, as well firing the returned deferred with the result (wherever it comes from). Only if all three lookups fail do you want to errback the deferred you returned.

Here’s a sample example usage, which shows how to use DeferredListObserver to build a simplified version of Twisted’s DeferredList class as a function:

from twisted.internet.defer import Deferred, succeed
from txdlo import DeferredListObserver

def deferredList(deferreds):
    """
    Return a deferred that fires with a list of (success, result) tuples,
    'success' being a boolean.

    @param deferreds: a C{list} of deferreds.
    @return: a L{twisted.internet.defer.Deferred} that fires as above.
    """

    if len(deferreds) == 0:
        return succeed([])

    dlo = DeferredListObserver(maintainHistory=True)
    map(dlo.append, deferreds)
    deferred = Deferred()

    def observer(*ignore):
        if dlo.pendingCount == 0:
            # Everything in the list has fired.
            resultList = [None] * len(deferreds)
            for index, success, value in dlo.history:
                resultList[index] = (success, value)
            deferred.callback(resultList)

    dlo.observe(observer)

    return deferred

You can grab the code, read more about usage, and see several other examples at https://github.com/terrycojones/txdlo.

Promises are first-class objects for function calls

Thursday, September 12th, 2013

Have you ever programmed in a language without functions as first-class objects? You can’t return a function from a function, can’t pass a function as an argument, and you certainly can’t make anonymous functions on the fly. Remember how liberating, empowering, and flexible it felt when you moved to a language with functions as first-class objects?

Yesterday, after 7 years of working with deferreds/promises/futures (call them what you will), thinking carefully about them thousands of times, mailing and blogging about them, giving talks about them, and now writing a book about them, I realized there is a very simple way to look at them:

Promises are first-class objects for function calls.

In other words, a promise gives you a way to make a function call and to pass around that function call. Return it from a function, pass it as an argument, put it in a data structure, etc. Given a promise, you can arrange to process the result of the function call or its error.

To be a bit more abstract: A promise is a time-independent reification of a function call.

By “time-independent” I mean that if you get a promise from somewhere, you don’t have to worry whether the underlying function call has already completed, is currently in progress, or has yet to run. Depending on the promise implementation there may be no way to find out. The point is that you don’t have to care and you shouldn’t care.

That’s all I’ll say for now, except for a final comment on naming.

I think “promise” is a better name than “future” or “deferred” because the latter two feel more like they’re referring to an event yet to happen. A promise feels more neutral: you could easily be making a promise about something you’ve already done. Many explanations of promises/deferreds/futures stress that they are something that will hold the result of a computation that’s not yet completed. That’s certainly a common usage, but it’s only part of the picture. Describing promises as a reification of a function call takes the time factor completely out of the picture.

Here’s a small Javascript function to memoize a (single-argument) function to illustrate the point:

var memoize = function(fn) {
    var promises = {};

    return function(arg) {
        var promise = promises[arg];
        if (!promise) {
            promise = promises[arg] = $.when(fn(arg)).promise();
        }
        return promise;
    };
};

The memoization cache is full of promises, not underlying function results. Some of the function calls may have completed, some may be underway. I’m using the jQuery $.when function as a convenience (that’s an irrelevant detail, don’t let it distract you). The promises stay in the cache indefinitely (no eviction, for simplicity), holding promises for function calls from the past.

Time is not an issue here.

Specifically, and in contrast, think about what happens with non-promise memoization. What happens if a first call comes in with an argument X, but before fn(X) has finished computing there is another call with argument X? Then another and another and another? You wind up calling fn(X) many times. I.e., doing exactly the thing you were trying to avoid! And there’s nothing you can do about it.

If you’re interested to review the book I’m writing with Nicholas Tollervey on jQuery deferreds, please email me or leave a comment below. Most of the book is not really specific to jQuery’s flavor of deferreds.

Yet another cancelable Twisted deferred class

Thursday, June 20th, 2013

I’m posting this (completely untested!) to illustrate how one could write a class to provide a Twisted deferred-like class, identical to the twisted.internet.defer.Deferred class, but which lets you call callback, errback, or cancel on the instance yourself. Hopefully that will make some sense. If not, let me know in the comments and I’ll try to be clearer. There’s also discussion of this issue (again) in the Twisted mailing list right now. For context, the discussion thread from January 2010 where I first wrote a class like the below is here.

from twisted.internet.defer import CancelledError, Deferred
from twisted.python.failure import Failure

class ControllableDeferred2013(object):

    """A Deferred-like class that takes a regular Twisted Deferred and
    provides a deferred that can be fired at will. If you have a regular
    Twisted Deferred, you can produce a deferred you have more control over
    by using your Deferred instance to make an instance of this class.

    Any time you need to fire a ControllableDeferred2013 instance for any
    reason, call its callback, errback or cancel method. It will fire
    immediately with the value you provide and the original Deferred will
    be cancelled."""

    def __init__(self, originalDeferred):
        self._fired = False
        self._originalDeferred = originalDeferred
        self._newDeferred = Deferred()
        for method in ('addBoth', 'addCallback', 'addCallbacks', 'addErrback',
                       'chainDeferred'):
            setattr(self, method, getattr(self._newDeferred, method))
        originalDeferred.addBoth(self._originalFired)

    def _originalFired(self, result):
        if not self._fired:
            self._fired = True
            self._originalDeferred.chainDeferred(self._newDeferred)

    def cancel(self, value=None):
        if not self._fired:
            self._fired = True
            self._newDeferred.errback(Failure(CancelledError(value)))
            self._originalDeferred.cancel()

    def callback(self, result=None):
        if not self._fired:
            self._fired = True
            self._newDeferred.callback(result)
            self._originalDeferred.cancel()

    def errback(self, fail=None):
        if not self._fired:
            self._fired = True
            self._newDeferred.errback(fail)
            self._originalDeferred.cancel()

    def pause(self):
        self._newDeferred.pause()
        self._originalDeferred.pause()

    def unpause(self):
        self._newDeferred.unpause()
        self._originalDeferred.unpause()
 

Simpler Twisted deferred code via decorated callbacks

Sunday, October 14th, 2012

This morning I was thinking about Twisted deferreds and how people find them difficult to grasp, but how they’re conceptually simple once you get it. I guess most of us tell people a deferred is something to hold a result that hasn’t arrived yet. Sometimes, though, deferreds do have a result in them immediately (e.g., using succeed or fail to get an already-fired deferred).

I wondered if it might work to tell people to think of a deferred as really being the result. If that were literally true, then instead of writing:

d = getPage(...)
d.addErrback(errcheck, args)
d.addCallback(cleanup, args)
d.addCallback(reformat, args)
return d

We might write something like:

result1 = getPage(...)
result2 = errcheck(result1, args)
result3 = cleanup(result2, args)
return reformat(result3, args)

And if you could write that, you could obviously instead write:

return reformat(cleanup(errcheck(getPage(...), args), args), args)

If we could write Twisted code that way, I think using deferreds would be simpler for people unfamiliar with them. We could show them Twisted code and not even have to mention deferreds (see below).

In the style we’re all used to, the programmer manually adds callbacks and errbacks. That’s basically boilerplate. It gets worse when you then need to also use DeferredList, etc. It’s a little confusing to read deferred code at first, because you need to know that the deferred result/failure is automatically passed as the first arg to callbacks/errbacks. It seems to take a year or more for people to finally realize how the callback & errback chains actually interact :-) Also, I wonder how comfortable programmers are with code ordered innermost function first, as in the normal d.addCallback(inner).addCallback(outer) Twisted style, versus outer(inner()), as in the line above.

Anyway… I realized we CAN let people use the succinct style above, by putting boilerplate into decorators. I wrote two decorators, called (surprise!) callback and errback. You can do this:

@errback
def errcheck(failure, arg):
    ...

@callback
def cleanup(page, arg):
    ...

@callback
def reformat(page, arg):
    ...

reformat(cleanup(errcheck(getPage(...), arg1), arg2), arg3)

The deferred callback and errback chains are hooked up automatically. You still get a regular deferred back as the return value.

And… the “deferred” aspect of the code (or at least the need to talk about or explain it) has conveniently vanished.

You can also do things like

func1(getDeferred1(), errcheck(func2(getDeferred2(), getDeferred3())))

This gets the result of deferreds 2 & 3 and (if neither fails) passes the result of calling func2 on both results through to func1, which is called along with the result of deferred 1. You don’t need to use DeferredLists, as the decorator makes them for you. The errcheck function wont be called at all unless there’s an error.

That’s nice compared to the verbose equivalent:

d1 = DeferredList([getDeferred2(), getDeferred3()])
d1.addCallback(func2)
d1.addErrback(errcheck)
d2 = DeferredList([getDeferred1(), d1])
d2.addCallback(func1)

Or the more compact but awkward:

DeferredList([getDeferred(), DeferredList([getDeferred(), getDeferred()]).addCallback(
    func2).addErrback(errcheck)]).addCallback(func1)
 

There’s lots more that could be said about this, but that’s enough for now. The code (surely not bulletproof) and some tests are on Github. I’ll add a README sometime soon. This is still pretty much proof of concept, and some it could be done slightly differently. I’m happy to discuss in more detail if people are interested.

txdpce: a Twisted class for deferred parallel command execution

Tuesday, July 12th, 2011

I just uploaded a simple Twisted Python class, txdpce, to Launchpad. It’s designed for situations where you have multiple ways of obtaining a function result and you want to try them all in parallel and return the result from the fastest function. A typical case is that you can either compute a result via a network call or try to get it out of a cache (perhaps also via a network call, to memcached). You might also be able to compute it locally, etc.

Things can be more complicated than provided for here, of course. E.g., you might like to cache the result of a local call (if it finishes first or if the cache lookup fails). The txdpce class is supposed to be a simple demonstration. I wrote it for a bit of fun this morning and also because it’s yet another nice example of how you can click together the building blocks of Twisted to form increasingly sophisticated classes.

Here’s the class. You’ll find a test suite at the Launchpad site. You can download the code using bzr via bzr branch lp:txdpce.

from twisted.internet import defer
from twisted.python import failure

class ParallelCommandException(Exception):
    pass

class DeferredParallelCommandExecutor(object):
    """
    Provides a mechanism for the execution of a command by multiple methods,
    returning the result from the one that finishes first.
    "
""

    def __init__(self):
        self._functions = []

    def registerFunction(self, func):
        """
        Add func to the list of functions that will be run when execute is
        called.

        @param func: A callable that should be run when execute is called.
        """
        self._functions.append(func)

    def execute(self, *args, **kwargs):
        """
        Run all the functions in self._functions on the given arguments and
        keyword arguments.

        @param args: Arguments to pass to the registered functions.

        @param kwargs: Keyword arguments to pass to the registered functions.

        @raise RuntimeError: if no execution functions have been registered.

        @return: A C{Deferred} that fires when the first of the functions
        finishes.
        """
        if not self._functions:
            raise RuntimeError('No execution functions have been registered.')

        deferreds = [defer.maybeDeferred(func, *args, **kwargs)
                     for func in self._functions]
        d = defer.DeferredList(deferreds, fireOnOneCallback=1, consumeErrors=1)
        d.addCallback(self._done, deferreds)
        return d

    def _done(self, deferredListResult, deferreds):
        """
        Process the result of the C{DeferredList} execution of all the
        functions in self._functions. If result is a tuple, it's the result
        of a successful function, in which case we cancel all the other
        deferreds (none of which has finished yet) and give back the
        result.  Otherwise, all the function calls must have failed (since
        we passed fireOnOneCallback=True to the C{DeferredList} and we
        return a L{ParallelCommandException} containing the failures.

        @param deferredListResult: The result of a C{DeferredList} returned
        by self.execute firing.

        @param deferreds: A list of deferreds for other functions that were
        trying to compute the result.

        @return: Either the result of the first function to successfully
        compute the result or a C{failure.Failure} containing a
        L{ParallelCommandException} with a list of the failures from all
        functions that tried to get the result.
        """
        if isinstance(deferredListResult, tuple):
            # A tuple result means the DeferredList fired with a successful
            # result.  Cancel all other deferreds and return the result.
            result, index = deferredListResult
            for i in range(len(self._functions)):
                if i != index:
                    deferreds[i].cancel()
            return result
        else:
            # All the deferreds failed. Return a list of all the failures.
            failures = [fail for (result, fail) in deferredListResult]
            return failure.Failure(ParallelCommandException(failures))
 

A resizable dispatch queue for Twisted

Monday, June 27th, 2011

In December 2009 I posted to the Twisted mailing list about what I called a Resizable Dispatch Queue. I’ve just spent some time making a new version that’s much improved. You can pick up the new version via pip install txrdq, from PyPI, or from the txRDQ project page on Launchpad. Here’s the example use case, taken from my original posting:

You want to write a server with a web interface that allows people to enter their phone number so you can send them an SMS. You anticipate lots of people will use the service. But sending SMS messages is quite slow, and the company that you ship those jobs off to is concerned that you’ll overrun their service (or maybe they have an API limit, etc). So you need to queue up jobs locally and send them off at a certain rate. You’d like to be able to adjust that rate up or down. You also want to be able to shut your service down cleanly (i.e., not in the middle of a task), and when you restart it you want to be able to re-queue the jobs that were queued last time but which hadn’t gone out.

To make the example more concrete, suppose your function that sends the SMS is called sendSMS and that it takes a (number, message) tuple argument. Here are some of the kinds of things you can do:

from txrdq.rdq import ResizableDispatchQueue

# Create a queue that will allow 5 simultaneous underway jobs.
rdq = ResizableDispatchQueue(sendSMS, 5)

# Later... send off some SMS messages.
d1 = rdq.put((2127399921, 'Hello...'), priority=5)
d1.addCallback(...)

d2 = rdq.put((5052929919, 'Test...'), priority=5)
d2.addCallback(...)

# Cancel the second job
d2.cancel()

# Widen the outgoing pipeline to 10 simultaneous jobs.
rdq.width = 10

# We're dispatching jobs too fast, turn it down a little.
rdq.width = 7

# Get a copy of the list of pending jobs.
jobs = rdq.pending()

# Cancel one of the pending jobs from the jobs list.
job.cancel()

# Reprioritize one of the pending jobs from the jobs list.
rdq.reprioritize(job, -1)

# Arrange to increase the number of jobs in one hour.
reactor.callLater(3600, rdq.setWidth, 20)

# Pause processing.
rdq.pause()

# Resume processing, with a new width of 8.
rdq.resume(8)

# Shutdown. Wait for any underway jobs to complete, and save
# the list of jobs not yet processed.

def saveJobs(jobs):
    pickle.dump(jobs, ...)

d = rdq.stop()
d.addCallback(saveJobs)
 

I’ve come up with many uses for this class over the last 18 months, and have quite often recommended it to people on the #twisted IRC channel. Other examples include fetching a large number of URLs in a controlled way, making many calls to the Twitter API, etc.

Usage of the class is very simple. You create the dispatch queue, giving it a function that will be called to process all jobs. Then you just put job arguments as fast as you like. Each call to put gets you a Twisted Deferred instance. If your function runs successfully on the argument, the deferred will call back with an instance of txrdq.job.Job. The job contains information about when it was queued, when it was launched, when it stopped, and of course the result of the function. If your function hits an error or the job is canceled (by calling cancel on the deferred), the deferred will errback and the failure will again contain a job instance with the details.

It’s also useful to have an admin interface to the queue, so calls such as pending and underway are provided. These return lists of job instances. You can call cancel on a job instance too. You can reprioritize jobs. And you can pause processing or shut the queue down cleanly.

The code also contains an independently useful Twisted classes called DeferredPriorityQueue (which I plan to write about), and DeferredPool (which I described earlier).

How to asynchronously exchange a dictionary using Twisted deferreds

Friday, June 10th, 2011

Here’s a fun class that I can’t think of a good use for :-) But I like its simplicity and it’s another illustration of what I like to call asynchronous data structures.

Suppose you have a producer who’s building a dictionary and a consumer who wants to look things up in it. The producer is working at their own pace, making new dictionary entries in whatever work order that suits them, and the consumer is requesting dictionary items in whatever order they need them. The two orders are obviously extremely unlikely to be the same if the dictionary is of non-trivial size. How do you write an asynchronous server (or data structure) that sits between the producer and consumer?

Yes, it’s far fetched, perhaps, but here’s a simple asynchronous dictionary class that lets you do it gracefully:

from collections import defaultdict
from twisted.internet import defer

class DeferredDict(dict):
    def __init__(self, *args, **kwargs):
        self._deferreds = defaultdict(set)
        dict.__init__(self, *args, **kwargs)

    def __getitem__(self, item):
        try:
            return defer.succeed(dict.__getitem__(self, item))
        except KeyError:
            d = defer.Deferred()
            self._deferreds[item].add(d)
            return d

    def __setitem__(self, item, value):
        if item in self._deferreds:
            for d in self._deferreds[item]:
                d.callback(value)
            del self._deferreds[item]
        dict.__setitem__(self, item, value)
 

When a consumer tries to get an element from the dictionary, they always get a deferred. The deferred will fire with the value from the dictionary when (if) it becomes available. Of course if the value is already known, they get it in a deferred that has already fired (via succeed). When the producer puts an element into the dictionary, any consumer deferreds that were waiting on that element’s value are given the value.

Note that a __delitem__ isn’t needed, we just inherit that from dict. If a non-existent item is deleted, you get the normal dictionary behavior (a KeyError). If the item does exist, that means the list of waiting deferreds on that item is empty (the fact the item exists means any waiting deferreds for the item have all been fired and that that item in the self._deferreds dictionary was deleted), so we can just let the dictionary class delete the item, as usual.

Graceful shutdown of a Twisted service with outstanding deferreds

Friday, June 10th, 2011

I’ve been spending a bit of time thinking again about queues and services. I wrote a Twisted class in 2009 to maintain a resizable dispatch queue (code in Launchpad, description on the Twisted mailing list). For this post I’ve pulled out (and simplified slightly) one of its helper classes, a DeferredPool.

This simple class maintains a set of deferreds and gives you a mechanism to get a deferred that will fire when (if!) the size of the set ever drops to zero. This is useful because it can be used to gracefully shut down a service that has a bunch of outstanding requests in flight. For each incoming request (that’s handled via a deferred), you add the deferred to the pool. When a signal arrives to tell the service to stop, you stop taking new requests and ask the pool for a deferred that will fire when all the outstanding deferreds are done, then you exit. This can all be done elegantly in Twisted, the last part by having the stopService method return the deferred you get back from the pool (perhaps after you add more cleanup callbacks to it).

Here’s the code:

from twisted.internet import defer

class DeferredPool(object):
    """Maintains a pool of not-yet-fired deferreds and gives a mechanism to
    request a deferred that fires when the pool size goes to zero."
""

    def __init__(self):
        self._pool = set()
        self._waiting = []

    def _fired(self, result, d):
        """Callback/errback each pooled deferred runs when it fires. The
        deferred first removes itself from the pool. If the pool is then
        empty, fire all the waiting deferreds (which were returned by
        notifyWhenEmpty)."
""
        self._pool.remove(d)
        if not self._pool:
            waiting, self._waiting = self._waiting, []
            for waiter in waiting:
                waiter.callback(None)
        return result

    def add(self, d):
        """Add a deferred to the pool."""
        d.addBoth(self._fired, d)
        self._pool.add(d)

    def notifyWhenEmpty(self, testImmediately=True):
        """Return a deferred that fires (with None) when the pool empties.
        If testImmediately is True and the pool is empty, return an already
        fired deferred (via succeed)."
""
        if testImmediately and not self._pool:
            return defer.succeed(None)
        else:
            d = defer.Deferred()
            self._waiting.append(d)
            return d
 

As usual I’m posting this example because I find Twisted’s deferreds so elegant. Here are a few comments on the above that might help you understand deferreds better.

A frequent pattern when creating and managing deferreds is that you can add callbacks and errbacks to them yourself to transparently do some housekeeping when they fire. In this case, for each deferred passed to add, I’m adding a callback and an errback that will run self._fired when the deferred fires. The first thing that method does is take the deferred out of the pool of outstanding deferreds. So the deferred itself cleans up the pool. It does that transparently, by which I mean that the call/errback function (self._fired) always returns whatever result it was passed. It’s on both the callback and errback chains of the deferred and has no effect on the result. The deferred may already have call/errbacks on it when it is passed to add, and it may have them added to it after add is done. Whoever created and is otherwise using the deferred will be none the wiser and is in no way affected.

When a deferred in the pool fires, it also checks to see if the pool size is zero and if there are any deferreds waiting to be informed of that. If so, it fires all the waiting deferreds and empties the list of waiting deferreds. This doesn’t mean the action is necessarily over. More deferreds can be added, more waiters can be added, etc. The pool size can go to zero again and if there are no waiters are waiting, no big deal, etc.

It’s easy to add functionality to e.g., record what time deferreds were added, provide stats, allow outstanding deferreds to be cancelled, add notifications when high/low water marks are reached, etc. But that’s enough for now. Feel free to ask questions below.

Twisted’s Deferred class chainDeferred method is too simplistic?

Saturday, August 14th, 2010

I think it’s worth spending some time thinking about whether chainDeferred in twisted.internet.defer.Deferred is too simplistic. I’ve thought for a while that it could be more helpful in preventing people from doing unintended things and/or cause fewer surprises (e.g., see point #1 in this post).

Those are smaller things that people can obviously live with. But a more important one concerns deferred cancellation. See uncalled.py below for why.

Here are five examples of chainDeferred behavior that I think could be better. In an attempt to fix these, I made a few changes to a copy of defer.py (immodestly called tdefer.py, sorry) which you can grab, with runnable examples, here. The tdefer.py code is meant as a suggested approach. I doubt that it’s bulletproof.

Here are the examples.

boom1.py:

# Normal deferreds: this raises defer.AlreadyCalledError because
# the callback of d1 causes the callback of d2 to be called, but d2 has
# already been cancelled (and hence called).

# With tdefer.py: there is no error because d1.callback will not call
# d2 as it has already been cancelled.

def printCancel(fail):
    fail.trap(defer.CancelledError)
    print ‘cancelled’

def canceller(d):
    print ‘cancelling’

d1 = defer.Deferred()
d2 = defer.Deferred(canceller)
d2.addErrback(printCancel)
d1.chainDeferred(d2)
d2.cancel()
d1.callback(‘hey’)
 

boom2.py:

# Normally: raises defer.AlreadyCalledError because calling d1.callback
# will call d2, which has already been called.

# With tdefer.py: Raises AssertionError: "Can’t callback an already
# chained deferred" because calling callback on a deferred that’s
# already been chained is asking for trouble (as above).

d1 = defer.Deferred()
d2 = defer.Deferred()
d1.chainDeferred(d2)
d2.callback(‘hey’)
d1.callback(‘jude’)
 

uncalled.py:

# Normally: although d2 has been chained to d1, when d1 is cancelled,
# d2′s cancel method is never called. Even calling d2.cancel ourselves
# after the call to d1.cancel has no effect, as d2 has already been
# called.

# With tdefer: both cancel1 and cancel2 are called when d1.cancel is
# called. The additional final call to d2.cancel correctly has no
# effect as d2 has been called (via d1.cancel).

def cancel1(d):
    print ‘cancel one’

def cancel2(d):
    print ‘cancel two’

def reportCancel(fail, which):
    fail.trap(defer.CancelledError)
    print ‘cancelled’, which

d1 = defer.Deferred(cancel1)
d1.addErrback(reportCancel, ‘one’)
d2 = defer.Deferred(cancel2)
d2.addErrback(reportCancel, ‘two’)
d1.chainDeferred(d2)
d1.cancel()
d2.cancel()
 

unexpected1.py:

# Normally: prints "called: None", instead of the probably expected
# "called: hey"

# tdefer.py: prints "called: hey"

def called(result):
    print ‘called:’, result

d1 = defer.Deferred()
d2 = defer.Deferred()
d1.chainDeferred(d2)
d1.addCallback(called)
d1.callback(‘hey’)
 

unexpected2.py:

# Normally: prints
#   called 2: hey
#   called 3: None

# tdefer.py: prints
#   called 2: hey
#   called 3: hey

def report2(result):
    print ‘called 2:’, result

def report3(result):
    print ‘called 3:’, result

d1 = defer.Deferred()
d2 = defer.Deferred().addCallback(report2)
d3 = defer.Deferred().addCallback(report3)
d1.chainDeferred(d2)
d1.chainDeferred(d3)
d1.callback(‘hey’)
 

I wont go into detail here as this post is already long enough. Those are 3 classes of behavior arising from chainDeferred being very simplistic. Comments welcome, of course. Once again, the runnable code is here.

Asynchronous data structures with Twisted Deferreds

Friday, July 23rd, 2010

Earlier this week I gave a talk titled Deferred Gratification (slides) at EuroPython in Birmingham. The title was supposed to hint at how much I love Twisted‘s Deferred class, and that it took me some time to really appreciate it.

While thinking about Deferreds in the days before the talk, I think I put my finger on one of the reasons I find Deferreds so attractive and elegant. The thought can be neatly summarized: Deferreds let you replace synchronous data structures with elegant asynchronous ones.

There are various ways that people get exposed to thinking about programming in an asynchronous style. The great majority do so via building user interfaces, usually with widget libraries wherein one specifies widgets and layouts, sets up event handlers, and then surrenders control to a main loop that thereafter routes events to handlers. Exposure to asynchronous programming via building UIs is becoming much more common as Javascript programmers build client-side web apps that operate asynchronously with web servers (and the “A” in AJAX of course stands for asynchronous). Others become aware of asynchronous programming via writing network code (perhaps using Twisted). Relatively few become aware of asynchronous programming via doing async filesystem I/O.

Because Twisted’s Deferreds don’t have anything to do with UIs or networking or filesystems, you can use them to implement other asynchronous things, like an asynchronous data structure. To show you what I mean, here’s a slightly simplified version of Twisted’s DeferredQueue class, taken from twisted/internet/defer.py:

class DeferredQueue(object):

    def __init__(self):
        # It would be better if these both used collections.deque (see comments section below).
        self.waiting = [] # Deferreds that are expecting an object
        self.pending = [] # Objects queued to go out via get.

    def put(self, obj):
        if self.waiting:
            self.waiting.pop(0).callback(obj)
        else:
            self.pending.append(obj)

    def get(self):
        if self.pending:
            return succeed(self.pending.pop(0))
        else:
            d = Deferred()
            self.waiting.append(d)
            return d
 

I find this extremely elegant, and I’m going to explain why.

But first, think about code for a regular synchronous queue. What happens if you call get on a regular queue that’s empty? Almost certainly one of two things: you’ll get some kind of QueueEmpty error, or else your code will block until some other code puts something into the queue. I.e., you either get a synchronous error or you get a synchronous non-empty response.

If you look at the get method in the code above, you’ll see that if the queue is empty (i.e., the pending list is empty), a new Deferred is made, added to self.waiting, and is immediately returned to the caller. So code calling get on an empty queue doesn’t get an error and doesn’t block, it always gets a result back essentially immediately. How can you get a result from an empty queue? Easy: the result is a Deferred. And because we’re in the asynchronous world, you just attach callbacks (like event handlers in the UI world) to the Deferred, and go on your merry way.

If you can follow that thinking, the rest of the code in the class above should be easy to grok. In put, if there are any outstanding Deferreds (i.e., earlier callers who hit an empty queue and got a Deferred back), the incoming object is given to the first of these by passing it to the callback function of the Deferred (and popping it out of the waiting list). If there are no outstanding Deferreds expecting a result, the incoming object is simply appended to self.pending. On the get side, if the queue (i.e., self.pending) is non-empty, the code creates a Deferred that has already been fired (using the succeed helper function) and returns it.

By now, this kind of code seems routine and straightforward to me. But it certainly wasn’t always that way. So if the above seems cryptic or abstract, I encourage you to think it over, maybe write some code, ask questions below, etc. To my mind, these kinds of constructs – simple, elegant, robust, practical, obviously(?) bug-free, single-threaded, etc. – are extremely instructive. You too can solve potentially gnarly (at least in the threaded world) networking coding challenges in very simple ways just by building code out of simple Twisted Deferreds (there’s nothing special about the DeferredQueue class, it’s just built using primitive Deferreds).

For extra points, think about the concept of asynchronous data structures. I find that concept very attractive, and putting my finger on it helped me to see part of the reason why I think Deferreds are so great. (In fact, as you can see from the use of succeed, Deferreds are not specifically asynchronous or synchronous: they’re so simple they don’t even care, and as a consumer of Deferreds, your code doesn’t have to either). That’s all remarkably elegant.

I sometimes hope O’Reilly will publish a second volume of Beautiful Code and ask me to write a chapter. I know exactly what I’d write about :-)

Twisted code for retrying function calls

Thursday, November 12th, 2009

These days I often find myself writing code to talk to services that are periodically briefly unavailable. An error of some kind occurs and the correct (and documented) action to take is just to retry the original call a little later. Examples include using Amazon’s S3 service and the Twitter API. In both of these services, transient failures happen fairly frequently.

So I wrote the Twisted class below to retry calls, and tried to make it fairly general. I’d be happy to hear comments on it, because it’s pretty simple and if it can be made bullet proof I imagine others will use it too.

In case you’re not familiar with Twisted and it’s not clear, the call retrying in the below is scheduled by the Twisted reactor. This all asynchronous event-based code that will not block (assuming the function you pass in also does not).

First off, here’s the class that handles the calling:

from twisted.internet import reactor, defer, task
from twisted.python import log, failure

class RetryingCall(object):
    """Calls a function repeatedly, passing it args and kw args. Failures
    are passed to a user-supplied failure testing function. If the failure
    is ignored, the function is called again after a delay whose duration
    is obtained from a user-supplied iterator. The start method (below)
    returns a deferred that fires with the eventual non-error result of
    calling the supplied function, or fires its errback if no successful
    result can be obtained before the delay backoff iterator raises
    StopIteration.
    "
""
    def __init__(self, f, *args, **kw):
        self._f = f
        self._args = args
        self._kw = kw
       
    def _err(self, fail):
        if self.failure is None:
            self.failure = fail
        try:
            fail = self._failureTester(fail)
        except:
            self._deferred.errback()
        else:
            if isinstance(fail, failure.Failure):
                self._deferred.errback(fail)
            else:
                log.msg(‘RetryingCall: Ignoring %r’ % (fail,))
                self._call()

    def _call(self):
        try:
            delay = self._backoffIterator.next()
        except StopIteration:
            log.msg(‘StopIteration in RetryingCall: ran out of attempts.’)
            self._deferred.errback(self.failure)
        else:
            d = task.deferLater(reactor, delay,
                                self._f, *self._args, **self._kw)
            d.addCallbacks(self._deferred.callback, self._err)

    def start(self, backoffIterator=None, failureTester=None):
        self._backoffIterator = iter(backoffIterator or simpleBackoffIterator())
        self._failureTester = failureTester or (lambda _: None)
        self._deferred = defer.Deferred()
        self.failure = None
        self._call()
        return self._deferred
 

You call the constructor with your function and the args it should be called with. Then you call start() to get back a deferred that will eventually fire with the result of the call, or an error. BTW, I called it “start” to mirror twisted.internet.task.LoopingCall.

There’s a helper function for producing successive inter-call delays:

from operator import mul
from functools import partial

def simpleBackoffIterator(maxResults=10, maxDelay=120.0, now=True,
                          initDelay=0.01, incFunc=None):
    assert maxResults > 0
    remaining = maxResults
    delay = initDelay
    incFunc = incFunc or partial(mul, 2.0)
   
    if now:
        yield 0.0
        remaining -= 1
       
    while remaining > 0:
        yield (delay if delay < maxDelay else maxDelay)
        delay = incFunc(delay)
        remaining -= 1
 

By default this will generate the sequence of inter-call delays 0.0, 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56 and it should be easy to see how you could write your own. Or you can just supply a list, etc. When the backoff iterator finishes, the RetryingCall class gives up on trying to get a non-error result from the function. In that case errback is called on the deferred that start() returns, with the failure from the first call.

You get to specify a function for testing failures. If it ever raises or returns a failure, the start() deferred’s errback is called. The failure tester can just ignore whatever failures should be considered transient.

So, for example, if you were calling S3 and wanted to ignore 504 errors, you could supply a failureTester arg like this:

    from twisted.web import error, http

    def test(self, failure):
        failure.trap(error.Error)
        if int(failure.value.status) != http.GATEWAY_TIMEOUT:
            return failure
 

As another example, while using the Twitter API you might want to allow a range of HTTP errors and also exactly one 404 error, seeing as a 404 might be an error on the part of Twitter (I don’t mean to suggest that actually happens). It’s probably definitive – but, why not try it once again just to be more sure? So, pass RetryingCall a failureTester that’s an instance of a class like this:

class TwitterFailureTester(object):
    okErrs = (http.INTERNAL_SERVER_ERROR,
              http.BAD_GATEWAY,
              http.SERVICE_UNAVAILABLE)

    def __init__(self):
        self.seen404 = False

    def __call__(self, failure):
        failure.trap(error.Error)
        status = int(failure.value.status)
        if status == http.NOT_FOUND:
            if self.seen404:
                return failure
            else:
                self.seen404 = True
        elif status not in self.okErrs:
            return failure
 

Changing existing code to use RetryingCall is pretty trivial. Take something like this

from twisted.web import client

def getUserByScreenname(screenname):
    d = client.getPage(
        ‘http://twitter.com/users/show.json?screen_name=glyf’)
    return d
 

and change it to look like this:

def getUserByScreenname(screenname):
    r = RetryingCall(client.getPage,
        ‘http://twitter.com/users/show.json?screen_name=glyf’)
    d = r.start(failureTester=TwitterFailureTester())
    return d
 

I wrote this about 10 days ago and posted it to the Twisted mailing list. No-one replied to say how horrible the code is or that it shoud be done another way, which is a pretty good sign. The above includes an improvement suggested by Tim Allen, and is slightly more useful than the code I posted originally (see the thread on the Twisted list for details).

Fault-tolerant Python Twisted classes for getting all Twitter friends or followers

Thursday, October 22nd, 2009

It’s been forever since I blogged here. I just wrote a little Python to grab all of a user’s friends or followers (or just their user ids). It uses Twisted, of course. There were two main reasons for doing this: 1) I want all friends/followers, not just the first bunch returned by the Twitter API, and 2) I wanted code that is fairly robust in the face of various 50x HTTP errors (I regularly experience INTERNAL_SERVER_ERROR, BAD_GATEWAY, and SERVICE_UNAVAILABLE).

If you want to use the code below and you’re not familiar with the Twitter API, consider whether you can use the FriendsIdFetcher and FollowersIdFetcher classes as they’ll do far fewer requests (you get 5000 results per API call, instead of 100). If you can live with user ids and do the occasional fetch of a full user, you’ll probably do far fewer API calls.

For the FriendsFetcher and FollowersFetcher classes, you get back a list of dictionaries, one per user. For FriendsIdFetcher and FollowersIdFetcher you get a list of Twitter user ids.

Of course there’s no documentation. Feel free to ask questions in the comments. Download the source.

import sys

from twisted.internet import defer
from twisted.web import client, error, http
   
if sys.hexversion >= 0x20600f0:
    import json
else:
    import simplejson as json

class _Fetcher(object):
    baseURL = ‘http://twitter.com/’
    URITemplate = None # Override in subclass.
    dataKey = None # Override in subclass.
    maxErrs = 10
    okErrs = (http.INTERNAL_SERVER_ERROR,
              http.BAD_GATEWAY,
              http.SERVICE_UNAVAILABLE)
   
    def __init__(self, name):
        assert self.baseURL.endswith(‘/’)
        self.results = []
        self.errCount = 0
        self.nextCursor = -1
        self.deferred = defer.Deferred()
        self.URL = self.baseURL + (self.URITemplate % { ‘name’ : name })

    def _fail(self, failure):
        failure.trap(error.Error)
        self.errCount += 1
        if (self.errCount < self.maxErrs and
            int(failure.value.status) in self.okErrs):
            self.fetch()
        else:
            self.deferred.errback(failure)
       
    def _parse(self, result):
        try:
            data = json.loads(result)
            self.nextCursor = data.get(‘next_cursor’)
            self.results.extend(data[self.dataKey])
        except Exception:
            self.deferred.errback()
        else:
            self.fetch()
           
    def _deDup(self):
        raise NotImplementedError(‘Override _deDup in subclasses.’)

    def fetch(self):
        if self.nextCursor:
            d = client.getPage(self.URL + ‘?cursor=%s’ % self.nextCursor)
            d.addCallback(self._parse)
            d.addErrback(self._fail)
        else:
            self.deferred.callback(self._deDup())
        return self.deferred

class _FriendsOrFollowersFetcher(_Fetcher):
    dataKey = u‘users’
   
    def _deDup(self):
        seen = set()
        result = []
        for userdict in self.results:
            uid = userdict[‘id’]
            if uid not in seen:
                result.append(userdict)
                seen.add(uid)
        return result

class _IdFetcher(_Fetcher):
    dataKey = u‘ids’
   
    def _deDup(self):
        # Keep the ids in the order we received them.
        seen = set()
        result = []
        for uid in self.results:
            if uid not in seen:
                result.append(uid)
                seen.add(uid)
        return result

class FriendsFetcher(_FriendsOrFollowersFetcher):
    URITemplate = ‘statuses/friends/%(name)s.json’

class FollowersFetcher(_FriendsOrFollowersFetcher):
    URITemplate = ‘statuses/followers/%(name)s.json’

class FriendsIdFetcher(_IdFetcher):
    URITemplate = ‘friends/ids/%(name)s.json’

class FollowersIdFetcher(_IdFetcher):
    URITemplate = ‘followers/ids/%(name)s.json’
 

Usage is dead simple:

fetcher = FriendsFetcher(‘terrycojones’)
d = fetcher.fetch()
d.addCallback(….) # etc.
 

Enjoy.

Facebook release Tornado and it’s not based on Twisted?

Saturday, September 12th, 2009

Image: Jay Smith

Image: Jay Smith

To their great credit, Facebook have just open-sourced more of their core software. This time it’s Tornado, an asynchronous web server written in Python.

Surely that can only mean one thing: Tornado is based on Twisted. Right?

Incredibly, no. Words fail me on this one. I’ve spent some hours today trying to put my thoughts into order so I could put together a reasonably coherent blog post on the subject. But I’ve failed. So here are some unstructured thoughts.

First of all, I’m not meaning to bash Facebook on this. At Fluidinfo we use their Thrift code. We’ll almost certainly use Scribe for logging at some point, and we’re seriously considering using Cassandra. Esteve Fernandez has put a ton of work into txAMQP to glue together Thrift, Twisted, and AMQP, and in the process became a Thrift committer.

Second, you can understand—or make an educated guess at—what happened: the Facebook programmers, like programmers everywhere, were strongly tempted to just write their own code instead of dealing with someone else’s. It’s not just about learning curves and fixing deficiencies, there are also issues of speed of changes and of control. At Fluidinfo we suffered through six months of maintaining our own set of related patches to Thrift before the Twisted support Esteve wrote was finally merged to trunk. That was painful and the temptation to scratch our own itch, fork, and forget about the official Thrift project was high.

Plus, Twisted suffers from the fact that the documentation is not as good as it could be. I commented at length on this over three years ago. Please read the follow-up posts in that thread for an illustration (one of many) of the maturity of the people running Twisted. Also note that the documentation has improved greatly since then. Nevertheless, Twisted is a huge project, it has tons of parts, and it’s difficult to document and to wrap your head around no matter what.

So you can understand why Facebook might have decided not to use Twisted. In their words:

We ended up writing our own web server and framework after looking at existing servers and tools like Twisted because none matched both our performance requirements and our ease-of-use requirements.

I’m betting it’s that last part that’s the key to the decision not to use Twisted.

But seriously…… WTF?

Twisted is an amazing piece of work, written by some truly brilliant coders, with huge experience doing exactly what Facebook set out to reinvent.

This is where I’m at a loss for words. I think: “what an historic missed opportunity” and “reinventing the wheel, badly” and “no, no, no, this cannot be” and “this is just so short-sighted” and “a crying shame” and many things besides.

Sure, Twisted is difficult to grok. But that’s no excuse. It’s a seriously cool and powerful framework, it’s vastly more sophisticated and useful and extensible than what Facebook have cobbled together. Facebook could have worked to improve twisted.web (which everyone agrees has some shortcomings) which could have benefitted greatly from even a small fraction of the resources Facebook must have put into Tornado. The end result would have been much better. Or Facebook could have just ignored twisted.web and built directly on top of the Twisted core. That would have been great too.

Or Facebook could have had a team of people who knew how to do it better, and produced something better than Twisted. I guess that’s the real frustration here – they’ve put a ton of effort into building something much more limited in scope and vision, and even the piece that they did build looks like a total hack built to scratch their short term needs.

What’s the biggest change in software engineering over the last decade? Arguably it’s the rise of test-driven development. I’m not the only one who thinks so. Yet here we are in late 2009 and Facebook have released a major piece of code with no test suite. Amazing. OK, you could argue this is a minor thing, that it’s not core to Tornado. That argument has some weight, but it’s hard to think that this thing is not a hack.

If you decide to use an asynchronous web framework, do you expect to have to manually set your sockets to be non-blocking? Do you feel like catching EWOULDBLOCK and EAGAIN yourself? Those sorts of things, and their non-portability (even within the world of Linux) are precisely the kinds of things that lead people away from writing C and towards doing rapid development using something robust and portable that looks after the details. They’re precisely the things Twisted takes care of for you, and which (at least in Twisted) work across platforms, including Windows.

It looks like Tornado are using a global reactor, which the Twisted folks have learned the hard way is not the best solution.

Those are just some of the complaints I’ve heard and seen in the Tornado code. I confess I’ve looked only superficially at their code – but more than enough to feel such a sense of lost opportunity. They built a small subsection of Twisted, they’ve done it with much less experience and elegance and hiding of detail than the Twisted code, and the thing doesn’t even come with a test suite. Who knows if it actually works, or when, or where, etc.?

And…. Twisted is so much more. HTTP is just one of many protocols Twisted speaks, including (from their home page): “TCP, UDP, SSL/TLS, multicast, Unix sockets, a large number of protocols including HTTP, NNTP, IMAP, SSH, IRC, FTP, and others”.

Want to build a sophisticated, powerful, and flexible asynchronous internet service in Python? Use Twisted.

A beautiful thing about Twisted is that it expands your mind. Its abstractions (particularly the clean separation and generality of transports, protocols, factories, services, and Deferreds—see here and here and here) makes you a better programmer. As I wrote to some friends in April 2006: “Reading the Twisted docs makes me feel like my brain is growing new muscles.”

Twisted’s deferreds are extraordinarily elegant and powerful, I’ve blogged and posted to the Twisted mailing list about them on multiple occasions. Learning to think in the Twisted way has been a programming joy to me over the last 3 years, so you can perhaps imagine my dismay that a company with the resources of Facebook couldn’t be bothered to get behind it and had to go reinvent the wheel, and do it so poorly. What a pity.

In my case, I threw away an entire year of C code in order to use Twisted in FluidDB. That was a decision I didn’t take lightly. I’d written my own libraries to do lots of low level network communications and RPC – including auto-generating server and client side glue libraries to serialize and unserialize RPC calls and args (a bit like Thrift), plus a server and tons of other stuff. I chucked it because it was too brittle. It was too much of a hack. It wasn’t portable enough. It was too get the details right. It wasn’t extensible.

In other words….. it was too much like Tornado! So I threw it all away in favor of Twisted. As I happily tell people, FluidDB is written in Python so it can use Twisted. It was a question of an amazingly great asynchronous networking framework determining the choice of programming language. And this was done in spite of the fact that I thought the Twisted docs sucked badly. The people behind Twisted were clearly brilliant and the community was great. There was an opportunity to make a bet on something and to contribute. I wish Facebook had made the same decision. It’s everyone’s loss that they did not. What a great pity.

A mixin class allowing Python __init__ methods to work with Twisted deferreds

Monday, May 11th, 2009

I posted to the Python Twisted list back in Nov 2008 with subject: A Python metaclass for Twisted allowing __init__ to return a Deferred

Briefly, I was trying to find a nice way to allow the __init__ method of a class to work with deferreds in such a way that methods of the class could use work done by __init__ safe in the knowledge that the deferreds had completed. E.g., if you have

class X(object):
    def __init__(self, host, port):
        def final(connection):
            self.db = connection
        d = makeDBConnection(host, port)
        d.addCallback(final)

    def query(self, q):
        return self.db.runQuery(q)
 

Then when you make an X and call query on it, there’s a chance the deferred wont have fired, and you’ll get an error. This is just a very simple illustrative example. There are many more, and this is a general problem of the synchronous world (in which __init__ is supposed to prepare a fully-fledged class instance and cannot return a deferred) meeting the asynchronous world in which, as Twisted programmers, we would like to (and must) use deferreds.

The earlier thread, with lots of useful followups can be read here. Although I learned a lot in that thread, I wasn’t completely happy with any of the solutions. Some of the things that still bugged me are in posts towards the end of the thread (here and here).

The various approaches we took back then all boiled down to waiting for a deferred to fire before the class instance was fully ready to use. When that happened, you had your instance and could call its methods.

I had also thought about an alternate approach: having __init__ add a callback to the deferreds it dealt with to set a flag in self and then have all dependent methods check that flag to see if the class instance was ready for use. But that 1) is ugly (too much extra code); 2) means the caller has to be prepared to deal with errors due to the class instance not being ready, and 3) adds a check to every method call. It would look something like this:

class X(object):
    def __init__(self, host, port):
        self.ready = False
        def final(connection):
            self.db = connection
            self.ready = True
        d = makeDBConnection(host, port)
        d.addCallback(final)

    def query(self, q):
        if not self.ready:
            raise IAmNotReadyException()
        return self.db.runQuery(q)
 

That was too ugly for my taste, for all of the above reasons, most especially for forcing the unfortunate caller of my code to handle IAmNotReadyException.

Anyway…. fast forward 6 months and I’ve hit the same problem again. It’s with existing code, in which I would like an __init__ to call something that (now, due to changes elsewhere) returns a deferred. So I started thinking again, and came up with a much cleaner way to do the alternate approach via a class mixin:

from twisted.internet import defer

class deferredInitMixin(object):
    def wrap(self, d, *wrappedMethods):
        self.waiting = []
        self.stored = {}

        def restore(_):
            for method in self.stored:
                setattr(self, method, self.stored[method])
            for d in self.waiting:
                d.callback(None)

        def makeWrapper(method):
            def wrapper(*args, **kw):
                d = defer.Deferred()
                d.addCallback(lambda _: self.stored[method](*args, **kw))
                self.waiting.append(d)
                return d
            return wrapper

        for method in wrappedMethods:
            self.stored[method] = getattr(self, method)
            setattr(self, method, makeWrapper(method))

        d.addCallback(restore)
 

You use it as in the class Test below:

from twisted.internet import defer, reactor

def fire(d, value):
    print "I finally fired, with value", value
    d.callback(value)

def late(value):
    d = defer.Deferred()
    reactor.callLater(1, fire, d, value)
    return d

def called(result, what):
    print ‘final callback of %s, result = %s’ % (what, result)

def stop(_):
    reactor.stop()

class Test(deferredInitMixin):
    def __init__(self):
        d = late(‘Test’)
        deferredInitMixin.wrap(self, d, ‘f1′, ‘f2′)

    def f1(self, arg):
        print "f1 called with", arg
        return late(arg)

    def f2(self, arg):
        print "f2 called with", arg
        return late(arg)

if __name__ == ‘__main__’:
    t = Test()
    d1 = t.f1(44)
    d1.addCallback(called, ‘f1′)
    d2 = t.f1(33)
    d2.addCallback(called, ‘f1′)
    d3 = t.f2(11)
    d3.addCallback(called, ‘f2′)
    d = defer.DeferredList([d1, d2, d3])
    d.addBoth(stop)
    reactor.run()
 

Effectively, the __init__ of my Test class asks deferredInitMixin to temporarily wrap some of its methods. deferredInitMixin stores the original methods away and replaces each of them with a function that immediately returns a deferred. So after __init__ finishes, code that calls the now-wrapped methods of the class instance before the deferred has fired will get a deferred back as usual (but see * below). As far as they know, everything is normal. Behind the scenes, deferredInitMixin has arranged for these deferreds to fire only after the deferred passed from __init__ has fired. Once that happens, deferredInitMixin also restores the original functions to the instance. As a result there is no overhead later to check a flag to see if the instance is ready to use. If the deferred from __init__ happens to fire before any of the instance’s methods are called, it will simply restore the original methods. Finally (obviously?) you only pass the method names to deferredInitMixin that depend on the deferred in __init__ being done.

BTW, calling the methods passed to deferredInitMixin “wrapped” isn’t really accurate. They’re just temporarily replaced.

I quite like this approach. It’s a second example of something I posted about here, in which a pool of deferreds is accumulated and all fired when another deferred fires. It’s nice because you don’t reply with an error and there’s no need for locking or other form of coordination – the work you need done is already in progress, so you get back a fresh deferred and everything goes swimmingly.

* Minor note: the methods you wrap should probably be ones that already return deferreds. That way you always get a deferred back from them, whether they’re temporarily wrapped or not. The above mixin works just fine if you ask it to wrap non-deferred-returning functions, but you have to deal with the possibility that they will return a deferred (i.e., if you call them while they’re wrapped).

A kinder and more consistent defer.inlineCallbacks

Friday, November 21st, 2008

Here’s a suggestion for making Twisted‘s inlineCallbacks function decorator more consistent and less confusing. Let’s suppose you’re writing something like this:

    @inlineCallbacks
    def func():
        # Do something.

    result = func()
 

There are 2 things that could be better, IMO:

1. func may not yield. In that case, you get an AttributeError when inlineCallbacks tries to send() to something that’s not a generator. Or worse, the call to send might actually work, and do who knows what. I.e., func() could return an object with a send method but which is not a generator. For some fun, run some code that calls the following decorated function (see if you can figure out what will happen before you do):

    @defer.inlineCallbacks
    def f():
        class yes():
            def send(x, y):
                print ‘yes’
                # accidentally_destroy_the_universe_too()
        return yes()
 

2. func might raise before it get to its first yield. In that case you’ll get an exception thrown when the inlineCallbacks decorator tries to create the wrapper function:

    File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 813, in unwindGenerator
      return _inlineCallbacks(None, f(*args, **kwargs), Deferred())

There’s a simple and consistent way to handle both of these. Just have inlineCallbacks do some initial work based on what it has been passed:

    def altInlineCallbacks(f):
        def unwindGenerator(*args, **kwargs):
            deferred = defer.Deferred()
            try:
                result = f(*args, **kwargs)
            except Exception, e:
                deferred.errback(e)
                return deferred
            if isinstance(result, types.GeneratorType):
                return defer._inlineCallbacks(None, result, deferred)
            deferred.callback(result)
            return deferred

        return mergeFunctionMetadata(f, unwindGenerator)
 

This has the advantage that (barring e.g., a KeyboardInterrupt in the middle of things) you’ll *always* get a deferred back when you call an inlineCallbacks decorated function. That deferred might have already called or erred back (corresponding to cases 1 and 2 above).

I’m going to use this version of inlineCallbacks in my code. There’s a case for it making it into Twisted itself: inlinecallbacks is already cryptic enough in its operation that anything we can do to make its operation more uniform and less surprising, the better.

You might think that case 1 rarely comes up. But I’ve hit it a few times, usually when commenting out sections of code for testing. If you accidentally comment out the last yield in func, it no longer returns a generator and that causes a different error.

And case 2 happens to me too. Having inlinecallbacks try/except the call to func is nicer because it means I don’t have to be quite so defensive in coding. So instead of me having to write

    try:
        d = func()
    except Exception:
        # Do something.
 

and try to figure out what happened if an exception fired, I can just write d = func() and add errbacks as I please (they then have to figure out what happened). The (slight?) disadvantage to my suggestion is that with the above try/except fragment you can tell if the call to func() raised before ever yielding. You can detect that, if you need to, with my approach if you’re not offended by looking at d.called immediately after calling func.

The alternate approach also helps if you’re a novice, or simply being lazy/careless/forgetful, and writing:

    d = func()
    d.addCallback(ok)
    d.addErrback(not_ok)
 

thinking you have your ass covered, but you actually don’t (due to case 2).

There’s some test code here that illustrates all this.

A Python metaclass for Twisted allowing __init__ to return a Deferred

Monday, November 3rd, 2008

OK, I admit, this is geeky.

But we’ve all run into the situation in which you’re using Python and Twisted, and you’re writing a new class and you want to call something from the __init method that returns a Deferred. This is a problem. The __init method is not allowed to return a value, let alone a Deferred. While you could just call the Deferred-returning function from inside your __init, there’s no guarantee of when that Deferred will fire. Seeing as you’re in your __init method, it’s a good bet that you need that function to have done its thing before you let anyone get their hands on an instance of your class.

For example, consider a class that provides access to a database table. You want the __init__ method to create the table in the db if it doesn’t already exist. But if you’re using Twisted’s twisted.enterprise.adbapi class, the runInteraction method returns a Deferred. You can call it to create the tables, but you don’t want the instance of your class back in the hands of whoever’s creating it until the table is created. Otherwise they might call a method on the instance that expects the table to be there.

A cumbersome solution would be to add a callback to the Deferred you get back from runInteraction and have that callback add an attribute to self to indicate that it is safe to proceed. Then all your class methods that access the db table would have to check to see if the attribute was on self, and take some alternate action if not. That’s going to get ugly very fast plus, your caller has to deal with you potentially not being ready.

I ran into this problem a couple of days ago and after scratching my head for a while I came up with an idea for how to solve this pretty cleanly via a Python metaclass. Here’s the metaclass code:

from twisted.internet import defer

class TxDeferredInitMeta(type):
    def __new__(mcl, classname, bases, classdict):
        hidden = ‘__hidden__’
        instantiate = ‘__instantiate__’
        for name in hidden, instantiate:
            if name in classdict:
                raise Exception(
                    ‘Class %s contains an illegally-named %s method’ %
                    (classname, name))
        try:
            origInit = classdict[‘__init__’]
        except KeyError:
            origInit = lambda self: None
        def newInit(self, *args, **kw):
            hiddenDict = dict(args=args, kw=kw, __init__=origInit)
            setattr(self, hidden, hiddenDict)
        def _instantiate(self):
            def addSelf(result):
                return (self, result)
            hiddenDict = getattr(self, hidden)
            d = defer.maybeDeferred(hiddenDict[‘__init__’], self,
                                    *hiddenDict[‘args’], **hiddenDict[‘kw’])
            return d.addCallback(addSelf)
        classdict[‘__init__’] = newInit
        classdict[instantiate] = _instantiate
        return super(TxDeferredInitMeta, mcl).__new__(
            mcl, classname, bases, classdict)
 

I’m not going to explain what it does here. If it’s not clear and you want to know, send me mail or post a comment. But I’ll show you how you use it in practice. It’s kind of weird, but it makes sense once you get used to it.

First, we make a class whose metaclass is TxDeferredInitMeta and whose __init__ method returns a deferred:

class MyClass(object):
    __metaclass__ = TxDeferredInitMeta
    def __init__(self):
        d = aFuncReturningADeferred()
        return d
 

Having __init__ return anything other than None is illegal in normal Python classes. But this is not a normal Python class, as you will now see.

Given our class, we use it like this:

def cb((instance, result)):
    # instance is an instance of MyClass
    # result is from the callback chain of aFuncReturningADeferred
    pass

d = MyClass()
d.__instantiate__()
d.addCallback(cb)
 

That may look pretty funky, but if you’re used to Twisted it wont seem too bizarre. What’s happening is that when you ask to make an instance of MyClass, you get back an instance of a regular Python class. It has a method called __instantiate__ that returns a Deferred. You add a callback to that Deferred and that callback is eventually passed two things. The first is an instance of MyClass, as you requested. The second is the result that came down the callback chain from the Deferred that was returned by the __init__ method you wrote in MyClass.

The net result is that you have the value of the Deferred and you have your instance of MyClass. It’s safe to go ahead and use the instance because you know the Deferred has been called. It will probably seem a bit odd to get your instance later as a result of a Deferred firing, but that’s perfectly in keeping with the Twisted way.

That’s it for now. You can grab the code and a trial test suite to put it through its paces at http://foss.fluidinfo.com/txDeferredInitMeta.zip. The code could be cleaned up somewhat, and made more general. There is a caveat to using it – your class can’t have __hidden__ or __instantiate__ methods. That could be improved. But I’m not going to bother for now, unless someone cares.