Archive for the ‘twisted’ Category

Twisted’s Deferred class chainDeferred method is too simplistic?

Saturday, August 14th, 2010

I think it’s worth spending some time thinking about whether chainDeferred in twisted.internet.defer.Deferred is too simplistic. I’ve thought for a while that it could be more helpful in preventing people from doing unintended things and/or cause fewer surprises (e.g., see point #1 in this post).

Those are smaller things that people can obviously live with. But a more important one concerns deferred cancellation. See uncalled.py below for why.

Here are five examples of chainDeferred behavior that I think could be better. In an attempt to fix these, I made a few changes to a copy of defer.py (immodestly called tdefer.py, sorry) which you can grab, with runnable examples, here. The tdefer.py code is meant as a suggested approach. I doubt that it’s bulletproof.

Here are the examples.

boom1.py:

# Normal deferreds: this raises defer.AlreadyCalledError because
# the callback of d1 causes the callback of d2 to be called, but d2 has
# already been cancelled (and hence called).

# With tdefer.py: there is no error because d1.callback will not call
# d2 as it has already been cancelled.

def printCancel(fail):
    fail.trap(defer.CancelledError)
    print ‘cancelled’

def canceller(d):
    print ‘cancelling’

d1 = defer.Deferred()
d2 = defer.Deferred(canceller)
d2.addErrback(printCancel)
d1.chainDeferred(d2)
d2.cancel()
d1.callback(‘hey’)
 

boom2.py:

# Normally: raises defer.AlreadyCalledError because calling d1.callback
# will call d2, which has already been called.

# With tdefer.py: Raises AssertionError: "Can’t callback an already
# chained deferred" because calling callback on a deferred that’s
# already been chained is asking for trouble (as above).

d1 = defer.Deferred()
d2 = defer.Deferred()
d1.chainDeferred(d2)
d2.callback(‘hey’)
d1.callback(‘jude’)
 

uncalled.py:

# Normally: although d2 has been chained to d1, when d1 is cancelled,
# d2′s cancel method is never called. Even calling d2.cancel ourselves
# after the call to d1.cancel has no effect, as d2 has already been
# called.

# With tdefer: both cancel1 and cancel2 are called when d1.cancel is
# called. The additional final call to d2.cancel correctly has no
# effect as d2 has been called (via d1.cancel).

def cancel1(d):
    print ‘cancel one’

def cancel2(d):
    print ‘cancel two’

def reportCancel(fail, which):
    fail.trap(defer.CancelledError)
    print ‘cancelled’, which

d1 = defer.Deferred(cancel1)
d1.addErrback(reportCancel, ‘one’)
d2 = defer.Deferred(cancel2)
d2.addErrback(reportCancel, ‘two’)
d1.chainDeferred(d2)
d1.cancel()
d2.cancel()
 

unexpected1.py:

# Normally: prints "called: None", instead of the probably expected
# "called: hey"

# tdefer.py: prints "called: hey"

def called(result):
    print ‘called:’, result

d1 = defer.Deferred()
d2 = defer.Deferred()
d1.chainDeferred(d2)
d1.addCallback(called)
d1.callback(‘hey’)
 

unexpected2.py:

# Normally: prints
#   called 2: hey
#   called 3: None

# tdefer.py: prints
#   called 2: hey
#   called 3: hey

def report2(result):
    print ‘called 2:’, result

def report3(result):
    print ‘called 3:’, result

d1 = defer.Deferred()
d2 = defer.Deferred().addCallback(report2)
d3 = defer.Deferred().addCallback(report3)
d1.chainDeferred(d2)
d1.chainDeferred(d3)
d1.callback(‘hey’)
 

I wont go into detail here as this post is already long enough. Those are 3 classes of behavior arising from chainDeferred being very simplistic. Comments welcome, of course. Once again, the runnable code is here.

Asynchronous data structures with Twisted Deferreds

Friday, July 23rd, 2010

Earlier this week I gave a talk titled Deferred Gratification (slides) at EuroPython in Birmingham. The title was supposed to hint at how much I love Twisted‘s Deferred class, and that it took me some time to really appreciate it.

While thinking about Deferreds in the days before the talk, I think I put my finger on one of the reasons I find Deferreds so attractive and elegant. The thought can be neatly summarized: Deferreds let you replace synchronous data structures with elegant asynchronous ones.

There are various ways that people get exposed to thinking about programming in an asynchronous style. The great majority do so via building user interfaces, usually with widget libraries wherein one specifies widgets and layouts, sets up event handlers, and then surrenders control to a main loop that thereafter routes events to handlers. Exposure to asynchronous programming via building UIs is becoming much more common as Javascript programmers build client-side web apps that operate asynchronously with web servers (and the “A” in AJAX of course stands for asynchronous). Others become aware of asynchronous programming via writing network code (perhaps using Twisted). Relatively few become aware of asynchronous programming via doing async filesystem I/O.

Because Twisted’s Deferreds don’t have anything to do with UIs or networking or filesystems, you can use them to implement other asynchronous things, like an asynchronous data structure. To show you what I mean, here’s a slightly simplified version of Twisted’s DeferredQueue class, taken from twisted/internet/defer.py:

class DeferredQueue(object):

    def __init__(self):
        # It would be better if these both used collections.deque (see comments section below).
        self.waiting = [] # Deferreds that are expecting an object
        self.pending = [] # Objects queued to go out via get.

    def put(self, obj):
        if self.waiting:
            self.waiting.pop(0).callback(obj)
        else:
            self.pending.append(obj)

    def get(self):
        if self.pending:
            return succeed(self.pending.pop(0))
        else:
            d = Deferred()
            self.waiting.append(d)
            return d
 

I find this extremely elegant, and I’m going to explain why.

But first, think about code for a regular synchronous queue. What happens if you call get on a regular queue that’s empty? Almost certainly one of two things: you’ll get some kind of QueueEmpty error, or else your code will block until some other code puts something into the queue. I.e., you either get a synchronous error or you get a synchronous non-empty response.

If you look at the get method in the code above, you’ll see that if the queue is empty (i.e., the pending list is empty), a new Deferred is made, added to self.waiting, and is immediately returned to the caller. So code calling get on an empty queue doesn’t get an error and doesn’t block, it always gets a result back essentially immediately. How can you get a result from an empty queue? Easy: the result is a Deferred. And because we’re in the asynchronous world, you just attach callbacks (like event handlers in the UI world) to the Deferred, and go on your merry way.

If you can follow that thinking, the rest of the code in the class above should be easy to grok. In put, if there are any outstanding Deferreds (i.e., earlier callers who hit an empty queue and got a Deferred back), the incoming object is given to the first of these by passing it to the callback function of the Deferred (and popping it out of the waiting list). If there are no outstanding Deferreds expecting a result, the incoming object is simply appended to self.pending. On the get side, if the queue (i.e., self.pending) is non-empty, the code creates a Deferred that has already been fired (using the succeed helper function) and returns it.

By now, this kind of code seems routine and straightforward to me. But it certainly wasn’t always that way. So if the above seems cryptic or abstract, I encourage you to think it over, maybe write some code, ask questions below, etc. To my mind, these kinds of constructs – simple, elegant, robust, practical, obviously(?) bug-free, single-threaded, etc. – are extremely instructive. You too can solve potentially gnarly (at least in the threaded world) networking coding challenges in very simple ways just by building code out of simple Twisted Deferreds (there’s nothing special about the DeferredQueue class, it’s just built using primitive Deferreds).

For extra points, think about the concept of asynchronous data structures. I find that concept very attractive, and putting my finger on it helped me to see part of the reason why I think Deferreds are so great. (In fact, as you can see from the use of succeed, Deferreds are not specifically asynchronous or synchronous: they’re so simple they don’t even care, and as a consumer of Deferreds, your code doesn’t have to either). That’s all remarkably elegant.

I sometimes hope O’Reilly will publish a second volume of Beautiful Code and ask me to write a chapter. I know exactly what I’d write about :-)

Twisted code for retrying function calls

Thursday, November 12th, 2009

These days I often find myself writing code to talk to services that are periodically briefly unavailable. An error of some kind occurs and the correct (and documented) action to take is just to retry the original call a little later. Examples include using Amazon’s S3 service and the Twitter API. In both of these services, transient failures happen fairly frequently.

So I wrote the Twisted class below to retry calls, and tried to make it fairly general. I’d be happy to hear comments on it, because it’s pretty simple and if it can be made bullet proof I imagine others will use it too.

In case you’re not familiar with Twisted and it’s not clear, the call retrying in the below is scheduled by the Twisted reactor. This all asynchronous event-based code that will not block (assuming the function you pass in also does not).

First off, here’s the class that handles the calling:

from twisted.internet import reactor, defer, task
from twisted.python import log, failure

class RetryingCall(object):
    """Calls a function repeatedly, passing it args and kw args. Failures
    are passed to a user-supplied failure testing function. If the failure
    is ignored, the function is called again after a delay whose duration
    is obtained from a user-supplied iterator. The start method (below)
    returns a deferred that fires with the eventual non-error result of
    calling the supplied function, or fires its errback if no successful
    result can be obtained before the delay backoff iterator raises
    StopIteration.
    "
""
    def __init__(self, f, *args, **kw):
        self._f = f
        self._args = args
        self._kw = kw
       
    def _err(self, fail):
        if self.failure is None:
            self.failure = fail
        try:
            fail = self._failureTester(fail)
        except:
            self._deferred.errback()
        else:
            if isinstance(fail, failure.Failure):
                self._deferred.errback(fail)
            else:
                log.msg(‘RetryingCall: Ignoring %r’ % (fail,))
                self._call()

    def _call(self):
        try:
            delay = self._backoffIterator.next()
        except StopIteration:
            log.msg(‘StopIteration in RetryingCall: ran out of attempts.’)
            self._deferred.errback(self.failure)
        else:
            d = task.deferLater(reactor, delay,
                                self._f, *self._args, **self._kw)
            d.addCallbacks(self._deferred.callback, self._err)

    def start(self, backoffIterator=None, failureTester=None):
        self._backoffIterator = iter(backoffIterator or simpleBackoffIterator())
        self._failureTester = failureTester or (lambda _: None)
        self._deferred = defer.Deferred()
        self.failure = None
        self._call()
        return self._deferred
 

You call the constructor with your function and the args it should be called with. Then you call start() to get back a deferred that will eventually fire with the result of the call, or an error. BTW, I called it “start” to mirror twisted.internet.task.LoopingCall.

There’s a helper function for producing successive inter-call delays:

from operator import mul
from functools import partial

def simpleBackoffIterator(maxResults=10, maxDelay=120.0, now=True,
                          initDelay=0.01, incFunc=None):
    assert maxResults > 0
    remaining = maxResults
    delay = initDelay
    incFunc = incFunc or partial(mul, 2.0)
   
    if now:
        yield 0.0
        remaining -= 1
       
    while remaining > 0:
        yield (delay if delay < maxDelay else maxDelay)
        delay = incFunc(delay)
        remaining -= 1
 

By default this will generate the sequence of inter-call delays 0.0, 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56 and it should be easy to see how you could write your own. Or you can just supply a list, etc. When the backoff iterator finishes, the RetryingCall class gives up on trying to get a non-error result from the function. In that case errback is called on the deferred that start() returns, with the failure from the first call.

You get to specify a function for testing failures. If it ever raises or returns a failure, the start() deferred’s errback is called. The failure tester can just ignore whatever failures should be considered transient.

So, for example, if you were calling S3 and wanted to ignore 504 errors, you could supply a failureTester arg like this:

    from twisted.web import error, http

    def test(self, failure):
        failure.trap(error.Error)
        if int(failure.value.status) != http.GATEWAY_TIMEOUT:
            return failure
 

As another example, while using the Twitter API you might want to allow a range of HTTP errors and also exactly one 404 error, seeing as a 404 might be an error on the part of Twitter (I don’t mean to suggest that actually happens). It’s probably definitive – but, why not try it once again just to be more sure? So, pass RetryingCall a failureTester that’s an instance of a class like this:

class TwitterFailureTester(object):
    okErrs = (http.INTERNAL_SERVER_ERROR,
              http.BAD_GATEWAY,
              http.SERVICE_UNAVAILABLE)

    def __init__(self):
        self.seen404 = False

    def __call__(self, failure):
        failure.trap(error.Error)
        status = int(failure.value.status)
        if status == http.NOT_FOUND:
            if self.seen404:
                return failure
            else:
                self.seen404 = True
        elif status not in self.okErrs:
            return failure
 

Changing existing code to use RetryingCall is pretty trivial. Take something like this

from twisted.web import client

def getUserByScreenname(screenname):
    d = client.getPage(
        ‘http://twitter.com/users/show.json?screen_name=glyf’)
    return d
 

and change it to look like this:

def getUserByScreenname(screenname):
    r = RetryingCall(client.getPage,
        ‘http://twitter.com/users/show.json?screen_name=glyf’)
    d = r.start(failureTester=TwitterFailureTester())
    return d
 

I wrote this about 10 days ago and posted it to the Twisted mailing list. No-one replied to say how horrible the code is or that it shoud be done another way, which is a pretty good sign. The above includes an improvement suggested by Tim Allen, and is slightly more useful than the code I posted originally (see the thread on the Twisted list for details).

Fault-tolerant Python Twisted classes for getting all Twitter friends or followers

Thursday, October 22nd, 2009

It’s been forever since I blogged here. I just wrote a little Python to grab all of a user’s friends or followers (or just their user ids). It uses Twisted, of course. There were two main reasons for doing this: 1) I want all friends/followers, not just the first bunch returned by the Twitter API, and 2) I wanted code that is fairly robust in the face of various 50x HTTP errors (I regularly experience INTERNAL_SERVER_ERROR, BAD_GATEWAY, and SERVICE_UNAVAILABLE).

If you want to use the code below and you’re not familiar with the Twitter API, consider whether you can use the FriendsIdFetcher and FollowersIdFetcher classes as they’ll do far fewer requests (you get 5000 results per API call, instead of 100). If you can live with user ids and do the occasional fetch of a full user, you’ll probably do far fewer API calls.

For the FriendsFetcher and FollowersFetcher classes, you get back a list of dictionaries, one per user. For FriendsIdFetcher and FollowersIdFetcher you get a list of Twitter user ids.

Of course there’s no documentation. Feel free to ask questions in the comments. Download the source.

import sys

from twisted.internet import defer
from twisted.web import client, error, http
   
if sys.hexversion >= 0x20600f0:
    import json
else:
    import simplejson as json

class _Fetcher(object):
    baseURL = ‘http://twitter.com/’
    URITemplate = None # Override in subclass.
    dataKey = None # Override in subclass.
    maxErrs = 10
    okErrs = (http.INTERNAL_SERVER_ERROR,
              http.BAD_GATEWAY,
              http.SERVICE_UNAVAILABLE)
   
    def __init__(self, name):
        assert self.baseURL.endswith(‘/’)
        self.results = []
        self.errCount = 0
        self.nextCursor = -1
        self.deferred = defer.Deferred()
        self.URL = self.baseURL + (self.URITemplate % { ‘name’ : name })

    def _fail(self, failure):
        failure.trap(error.Error)
        self.errCount += 1
        if (self.errCount < self.maxErrs and
            int(failure.value.status) in self.okErrs):
            self.fetch()
        else:
            self.deferred.errback(failure)
       
    def _parse(self, result):
        try:
            data = json.loads(result)
            self.nextCursor = data.get(‘next_cursor’)
            self.results.extend(data[self.dataKey])
        except Exception:
            self.deferred.errback()
        else:
            self.fetch()
           
    def _deDup(self):
        raise NotImplementedError(‘Override _deDup in subclasses.’)

    def fetch(self):
        if self.nextCursor:
            d = client.getPage(self.URL + ‘?cursor=%s’ % self.nextCursor)
            d.addCallback(self._parse)
            d.addErrback(self._fail)
        else:
            self.deferred.callback(self._deDup())
        return self.deferred

class _FriendsOrFollowersFetcher(_Fetcher):
    dataKey = u‘users’
   
    def _deDup(self):
        seen = set()
        result = []
        for userdict in self.results:
            uid = userdict[‘id’]
            if uid not in seen:
                result.append(userdict)
                seen.add(uid)
        return result

class _IdFetcher(_Fetcher):
    dataKey = u‘ids’
   
    def _deDup(self):
        # Keep the ids in the order we received them.
        seen = set()
        result = []
        for uid in self.results:
            if uid not in seen:
                result.append(uid)
                seen.add(uid)
        return result

class FriendsFetcher(_FriendsOrFollowersFetcher):
    URITemplate = ‘statuses/friends/%(name)s.json’

class FollowersFetcher(_FriendsOrFollowersFetcher):
    URITemplate = ‘statuses/followers/%(name)s.json’

class FriendsIdFetcher(_IdFetcher):
    URITemplate = ‘friends/ids/%(name)s.json’

class FollowersIdFetcher(_IdFetcher):
    URITemplate = ‘followers/ids/%(name)s.json’
 

Usage is dead simple:

fetcher = FriendsFetcher(‘terrycojones’)
d = fetcher.fetch()
d.addCallback(….) # etc.
 

Enjoy.

Facebook release Tornado and it’s not based on Twisted?

Saturday, September 12th, 2009

Image: Jay Smith

Image: Jay Smith

To their great credit, Facebook have just open-sourced more of their core software. This time it’s Tornado, an asynchronous web server written in Python.

Surely that can only mean one thing: Tornado is based on Twisted. Right?

Incredibly, no. Words fail me on this one. I’ve spent some hours today trying to put my thoughts into order so I could put together a reasonably coherent blog post on the subject. But I’ve failed. So here are some unstructured thoughts.

First of all, I’m not meaning to bash Facebook on this. At Fluidinfo we use their Thrift code. We’ll almost certainly use Scribe for logging at some point, and we’re seriously considering using Cassandra. Esteve Fernandez has put a ton of work into txAMQP to glue together Thrift, Twisted, and AMQP, and in the process became a Thrift committer.

Second, you can understand—or make an educated guess at—what happened: the Facebook programmers, like programmers everywhere, were strongly tempted to just write their own code instead of dealing with someone else’s. It’s not just about learning curves and fixing deficiencies, there are also issues of speed of changes and of control. At Fluidinfo we suffered through six months of maintaining our own set of related patches to Thrift before the Twisted support Esteve wrote was finally merged to trunk. That was painful and the temptation to scratch our own itch, fork, and forget about the official Thrift project was high.

Plus, Twisted suffers from the fact that the documentation is not as good as it could be. I commented at length on this over three years ago. Please read the follow-up posts in that thread for an illustration (one of many) of the maturity of the people running Twisted. Also note that the documentation has improved greatly since then. Nevertheless, Twisted is a huge project, it has tons of parts, and it’s difficult to document and to wrap your head around no matter what.

So you can understand why Facebook might have decided not to use Twisted. In their words:

We ended up writing our own web server and framework after looking at existing servers and tools like Twisted because none matched both our performance requirements and our ease-of-use requirements.

I’m betting it’s that last part that’s the key to the decision not to use Twisted.

But seriously…… WTF?

Twisted is an amazing piece of work, written by some truly brilliant coders, with huge experience doing exactly what Facebook set out to reinvent.

This is where I’m at a loss for words. I think: “what an historic missed opportunity” and “reinventing the wheel, badly” and “no, no, no, this cannot be” and “this is just so short-sighted” and “a crying shame” and many things besides.

Sure, Twisted is difficult to grok. But that’s no excuse. It’s a seriously cool and powerful framework, it’s vastly more sophisticated and useful and extensible than what Facebook have cobbled together. Facebook could have worked to improve twisted.web (which everyone agrees has some shortcomings) which could have benefitted greatly from even a small fraction of the resources Facebook must have put into Tornado. The end result would have been much better. Or Facebook could have just ignored twisted.web and built directly on top of the Twisted core. That would have been great too.

Or Facebook could have had a team of people who knew how to do it better, and produced something better than Twisted. I guess that’s the real frustration here – they’ve put a ton of effort into building something much more limited in scope and vision, and even the piece that they did build looks like a total hack built to scratch their short term needs.

What’s the biggest change in software engineering over the last decade? Arguably it’s the rise of test-driven development. I’m not the only one who thinks so. Yet here we are in late 2009 and Facebook have released a major piece of code with no test suite. Amazing. OK, you could argue this is a minor thing, that it’s not core to Tornado. That argument has some weight, but it’s hard to think that this thing is not a hack.

If you decide to use an asynchronous web framework, do you expect to have to manually set your sockets to be non-blocking? Do you feel like catching EWOULDBLOCK and EAGAIN yourself? Those sorts of things, and their non-portability (even within the world of Linux) are precisely the kinds of things that lead people away from writing C and towards doing rapid development using something robust and portable that looks after the details. They’re precisely the things Twisted takes care of for you, and which (at least in Twisted) work across platforms, including Windows.

It looks like Tornado are using a global reactor, which the Twisted folks have learned the hard way is not the best solution.

Those are just some of the complaints I’ve heard and seen in the Tornado code. I confess I’ve looked only superficially at their code – but more than enough to feel such a sense of lost opportunity. They built a small subsection of Twisted, they’ve done it with much less experience and elegance and hiding of detail than the Twisted code, and the thing doesn’t even come with a test suite. Who knows if it actually works, or when, or where, etc.?

And…. Twisted is so much more. HTTP is just one of many protocols Twisted speaks, including (from their home page): “TCP, UDP, SSL/TLS, multicast, Unix sockets, a large number of protocols including HTTP, NNTP, IMAP, SSH, IRC, FTP, and others”.

Want to build a sophisticated, powerful, and flexible asynchronous internet service in Python? Use Twisted.

A beautiful thing about Twisted is that it expands your mind. Its abstractions (particularly the clean separation and generality of transports, protocols, factories, services, and Deferreds—see here and here and here) makes you a better programmer. As I wrote to some friends in April 2006: “Reading the Twisted docs makes me feel like my brain is growing new muscles.”

Twisted’s deferreds are extraordinarily elegant and powerful, I’ve blogged and posted to the Twisted mailing list about them on multiple occasions. Learning to think in the Twisted way has been a programming joy to me over the last 3 years, so you can perhaps imagine my dismay that a company with the resources of Facebook couldn’t be bothered to get behind it and had to go reinvent the wheel, and do it so poorly. What a pity.

In my case, I threw away an entire year of C code in order to use Twisted in FluidDB. That was a decision I didn’t take lightly. I’d written my own libraries to do lots of low level network communications and RPC – including auto-generating server and client side glue libraries to serialize and unserialize RPC calls and args (a bit like Thrift), plus a server and tons of other stuff. I chucked it because it was too brittle. It was too much of a hack. It wasn’t portable enough. It was too get the details right. It wasn’t extensible.

In other words….. it was too much like Tornado! So I threw it all away in favor of Twisted. As I happily tell people, FluidDB is written in Python so it can use Twisted. It was a question of an amazingly great asynchronous networking framework determining the choice of programming language. And this was done in spite of the fact that I thought the Twisted docs sucked badly. The people behind Twisted were clearly brilliant and the community was great. There was an opportunity to make a bet on something and to contribute. I wish Facebook had made the same decision. It’s everyone’s loss that they did not. What a great pity.

A mixin class allowing Python __init__ methods to work with Twisted deferreds

Monday, May 11th, 2009

I posted to the Python Twisted list back in Nov 2008 with subject: A Python metaclass for Twisted allowing __init__ to return a Deferred

Briefly, I was trying to find a nice way to allow the __init__ method of a class to work with deferreds in such a way that methods of the class could use work done by __init__ safe in the knowledge that the deferreds had completed. E.g., if you have

class X(object):
    def __init__(self, host, port):
        def final(connection):
            self.db = connection
        d = makeDBConnection(host, port)
        d.addCallback(final)

    def query(self, q):
        return self.db.runQuery(q)
 

Then when you make an X and call query on it, there’s a chance the deferred wont have fired, and you’ll get an error. This is just a very simple illustrative example. There are many more, and this is a general problem of the synchronous world (in which __init__ is supposed to prepare a fully-fledged class instance and cannot return a deferred) meeting the asynchronous world in which, as Twisted programmers, we would like to (and must) use deferreds.

The earlier thread, with lots of useful followups can be read here. Although I learned a lot in that thread, I wasn’t completely happy with any of the solutions. Some of the things that still bugged me are in posts towards the end of the thread (here and here).

The various approaches we took back then all boiled down to waiting for a deferred to fire before the class instance was fully ready to use. When that happened, you had your instance and could call its methods.

I had also thought about an alternate approach: having __init__ add a callback to the deferreds it dealt with to set a flag in self and then have all dependent methods check that flag to see if the class instance was ready for use. But that 1) is ugly (too much extra code); 2) means the caller has to be prepared to deal with errors due to the class instance not being ready, and 3) adds a check to every method call. It would look something like this:

class X(object):
    def __init__(self, host, port):
        self.ready = False
        def final(connection):
            self.db = connection
            self.ready = True
        d = makeDBConnection(host, port)
        d.addCallback(final)

    def query(self, q):
        if not self.ready:
            raise IAmNotReadyException()
        return self.db.runQuery(q)
 

That was too ugly for my taste, for all of the above reasons, most especially for forcing the unfortunate caller of my code to handle IAmNotReadyException.

Anyway…. fast forward 6 months and I’ve hit the same problem again. It’s with existing code, in which I would like an __init__ to call something that (now, due to changes elsewhere) returns a deferred. So I started thinking again, and came up with a much cleaner way to do the alternate approach via a class mixin:

from twisted.internet import defer

class deferredInitMixin(object):
    def wrap(self, d, *wrappedMethods):
        self.waiting = []
        self.stored = {}

        def restore(_):
            for method in self.stored:
                setattr(self, method, self.stored[method])
            for d in self.waiting:
                d.callback(None)

        def makeWrapper(method):
            def wrapper(*args, **kw):
                d = defer.Deferred()
                d.addCallback(lambda _: self.stored[method](*args, **kw))
                self.waiting.append(d)
                return d
            return wrapper

        for method in wrappedMethods:
            self.stored[method] = getattr(self, method)
            setattr(self, method, makeWrapper(method))

        d.addCallback(restore)
 

You use it as in the class Test below:

from twisted.internet import defer, reactor

def fire(d, value):
    print "I finally fired, with value", value
    d.callback(value)

def late(value):
    d = defer.Deferred()
    reactor.callLater(1, fire, d, value)
    return d

def called(result, what):
    print ‘final callback of %s, result = %s’ % (what, result)

def stop(_):
    reactor.stop()

class Test(deferredInitMixin):
    def __init__(self):
        d = late(‘Test’)
        deferredInitMixin.wrap(self, d, ‘f1′, ‘f2′)

    def f1(self, arg):
        print "f1 called with", arg
        return late(arg)

    def f2(self, arg):
        print "f2 called with", arg
        return late(arg)

if __name__ == ‘__main__’:
    t = Test()
    d1 = t.f1(44)
    d1.addCallback(called, ‘f1′)
    d2 = t.f1(33)
    d2.addCallback(called, ‘f1′)
    d3 = t.f2(11)
    d3.addCallback(called, ‘f2′)
    d = defer.DeferredList([d1, d2, d3])
    d.addBoth(stop)
    reactor.run()
 

Effectively, the __init__ of my Test class asks deferredInitMixin to temporarily wrap some of its methods. deferredInitMixin stores the original methods away and replaces each of them with a function that immediately returns a deferred. So after __init__ finishes, code that calls the now-wrapped methods of the class instance before the deferred has fired will get a deferred back as usual (but see * below). As far as they know, everything is normal. Behind the scenes, deferredInitMixin has arranged for these deferreds to fire only after the deferred passed from __init__ has fired. Once that happens, deferredInitMixin also restores the original functions to the instance. As a result there is no overhead later to check a flag to see if the instance is ready to use. If the deferred from __init__ happens to fire before any of the instance’s methods are called, it will simply restore the original methods. Finally (obviously?) you only pass the method names to deferredInitMixin that depend on the deferred in __init__ being done.

BTW, calling the methods passed to deferredInitMixin “wrapped” isn’t really accurate. They’re just temporarily replaced.

I quite like this approach. It’s a second example of something I posted about here, in which a pool of deferreds is accumulated and all fired when another deferred fires. It’s nice because you don’t reply with an error and there’s no need for locking or other form of coordination – the work you need done is already in progress, so you get back a fresh deferred and everything goes swimmingly.

* Minor note: the methods you wrap should probably be ones that already return deferreds. That way you always get a deferred back from them, whether they’re temporarily wrapped or not. The above mixin works just fine if you ask it to wrap non-deferred-returning functions, but you have to deal with the possibility that they will return a deferred (i.e., if you call them while they’re wrapped).

A kinder and more consistent defer.inlineCallbacks

Friday, November 21st, 2008

Here’s a suggestion for making Twisted‘s inlineCallbacks function decorator more consistent and less confusing. Let’s suppose you’re writing something like this:

    @inlineCallbacks
    def func():
        # Do something.

    result = func()
 

There are 2 things that could be better, IMO:

1. func may not yield. In that case, you get an AttributeError when inlineCallbacks tries to send() to something that’s not a generator. Or worse, the call to send might actually work, and do who knows what. I.e., func() could return an object with a send method but which is not a generator. For some fun, run some code that calls the following decorated function (see if you can figure out what will happen before you do):

    @defer.inlineCallbacks
    def f():
        class yes():
            def send(x, y):
                print ‘yes’
                # accidentally_destroy_the_universe_too()
        return yes()
 

2. func might raise before it get to its first yield. In that case you’ll get an exception thrown when the inlineCallbacks decorator tries to create the wrapper function:

    File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 813, in unwindGenerator
      return _inlineCallbacks(None, f(*args, **kwargs), Deferred())

There’s a simple and consistent way to handle both of these. Just have inlineCallbacks do some initial work based on what it has been passed:

    def altInlineCallbacks(f):
        def unwindGenerator(*args, **kwargs):
            deferred = defer.Deferred()
            try:
                result = f(*args, **kwargs)
            except Exception, e:
                deferred.errback(e)
                return deferred
            if isinstance(result, types.GeneratorType):
                return defer._inlineCallbacks(None, result, deferred)
            deferred.callback(result)
            return deferred

        return mergeFunctionMetadata(f, unwindGenerator)
 

This has the advantage that (barring e.g., a KeyboardInterrupt in the middle of things) you’ll *always* get a deferred back when you call an inlineCallbacks decorated function. That deferred might have already called or erred back (corresponding to cases 1 and 2 above).

I’m going to use this version of inlineCallbacks in my code. There’s a case for it making it into Twisted itself: inlinecallbacks is already cryptic enough in its operation that anything we can do to make its operation more uniform and less surprising, the better.

You might think that case 1 rarely comes up. But I’ve hit it a few times, usually when commenting out sections of code for testing. If you accidentally comment out the last yield in func, it no longer returns a generator and that causes a different error.

And case 2 happens to me too. Having inlinecallbacks try/except the call to func is nicer because it means I don’t have to be quite so defensive in coding. So instead of me having to write

    try:
        d = func()
    except Exception:
        # Do something.
 

and try to figure out what happened if an exception fired, I can just write d = func() and add errbacks as I please (they then have to figure out what happened). The (slight?) disadvantage to my suggestion is that with the above try/except fragment you can tell if the call to func() raised before ever yielding. You can detect that, if you need to, with my approach if you’re not offended by looking at d.called immediately after calling func.

The alternate approach also helps if you’re a novice, or simply being lazy/careless/forgetful, and writing:

    d = func()
    d.addCallback(ok)
    d.addErrback(not_ok)
 

thinking you have your ass covered, but you actually don’t (due to case 2).

There’s some test code here that illustrates all this.

A Python metaclass for Twisted allowing __init__ to return a Deferred

Monday, November 3rd, 2008

OK, I admit, this is geeky.

But we’ve all run into the situation in which you’re using Python and Twisted, and you’re writing a new class and you want to call something from the __init method that returns a Deferred. This is a problem. The __init method is not allowed to return a value, let alone a Deferred. While you could just call the Deferred-returning function from inside your __init, there’s no guarantee of when that Deferred will fire. Seeing as you’re in your __init method, it’s a good bet that you need that function to have done its thing before you let anyone get their hands on an instance of your class.

For example, consider a class that provides access to a database table. You want the __init__ method to create the table in the db if it doesn’t already exist. But if you’re using Twisted’s twisted.enterprise.adbapi class, the runInteraction method returns a Deferred. You can call it to create the tables, but you don’t want the instance of your class back in the hands of whoever’s creating it until the table is created. Otherwise they might call a method on the instance that expects the table to be there.

A cumbersome solution would be to add a callback to the Deferred you get back from runInteraction and have that callback add an attribute to self to indicate that it is safe to proceed. Then all your class methods that access the db table would have to check to see if the attribute was on self, and take some alternate action if not. That’s going to get ugly very fast plus, your caller has to deal with you potentially not being ready.

I ran into this problem a couple of days ago and after scratching my head for a while I came up with an idea for how to solve this pretty cleanly via a Python metaclass. Here’s the metaclass code:

from twisted.internet import defer

class TxDeferredInitMeta(type):
    def __new__(mcl, classname, bases, classdict):
        hidden = ‘__hidden__’
        instantiate = ‘__instantiate__’
        for name in hidden, instantiate:
            if name in classdict:
                raise Exception(
                    ‘Class %s contains an illegally-named %s method’ %
                    (classname, name))
        try:
            origInit = classdict[‘__init__’]
        except KeyError:
            origInit = lambda self: None
        def newInit(self, *args, **kw):
            hiddenDict = dict(args=args, kw=kw, __init__=origInit)
            setattr(self, hidden, hiddenDict)
        def _instantiate(self):
            def addSelf(result):
                return (self, result)
            hiddenDict = getattr(self, hidden)
            d = defer.maybeDeferred(hiddenDict[‘__init__’], self,
                                    *hiddenDict[‘args’], **hiddenDict[‘kw’])
            return d.addCallback(addSelf)
        classdict[‘__init__’] = newInit
        classdict[instantiate] = _instantiate
        return super(TxDeferredInitMeta, mcl).__new__(
            mcl, classname, bases, classdict)
 

I’m not going to explain what it does here. If it’s not clear and you want to know, send me mail or post a comment. But I’ll show you how you use it in practice. It’s kind of weird, but it makes sense once you get used to it.

First, we make a class whose metaclass is TxDeferredInitMeta and whose __init__ method returns a deferred:

class MyClass(object):
    __metaclass__ = TxDeferredInitMeta
    def __init__(self):
        d = aFuncReturningADeferred()
        return d
 

Having __init__ return anything other than None is illegal in normal Python classes. But this is not a normal Python class, as you will now see.

Given our class, we use it like this:

def cb((instance, result)):
    # instance is an instance of MyClass
    # result is from the callback chain of aFuncReturningADeferred
    pass

d = MyClass()
d.__instantiate__()
d.addCallback(cb)
 

That may look pretty funky, but if you’re used to Twisted it wont seem too bizarre. What’s happening is that when you ask to make an instance of MyClass, you get back an instance of a regular Python class. It has a method called __instantiate__ that returns a Deferred. You add a callback to that Deferred and that callback is eventually passed two things. The first is an instance of MyClass, as you requested. The second is the result that came down the callback chain from the Deferred that was returned by the __init__ method you wrote in MyClass.

The net result is that you have the value of the Deferred and you have your instance of MyClass. It’s safe to go ahead and use the instance because you know the Deferred has been called. It will probably seem a bit odd to get your instance later as a result of a Deferred firing, but that’s perfectly in keeping with the Twisted way.

That’s it for now. You can grab the code and a trial test suite to put it through its paces at http://foss.fluidinfo.com/txDeferredInitMeta.zip. The code could be cleaned up somewhat, and made more general. There is a caveat to using it – your class can’t have __hidden__ or __instantiate__ methods. That could be improved. But I’m not going to bother for now, unless someone cares.