Add to Technorati Favorites

Finishing Proust, redux

03:57 June 21st, 2010 by terry. Posted under books, me. | 2 Comments »

Back in December 2006 I wrote about finishing Proust and made a rough argument about how often anyone on earth finishes the whole thing. The argument was a bit subtle. I was never 100% convinced it was sound, but no-one I showed it to found a hole in it. I still think about the question from time to time. The other day I mentioned the original post to Tim O’Reilly. Later that day, I realized there’s a much simpler way to get an estimate, with far fewer assumptions.

The new approach is simply to divide the number of hours that have passed since In Search of Lost Time was published by the number of people who’ve ever finished it. That average is a crude measure, but it may be nevertheless quite accurate and it’s irresistibly interesting to me to see how it compares to my original 2006 estimate of 2.19 hours.

So, assume 2B people were alive in 1927 when the final volume was published, and 6.4B alive at the end of 2006 (source).

Assume that no-one alive in 1927 was still alive in 2006 (obviously not the case, but not unreasonable and not a significant error). I.e., there were 4.4B births in those 79 years. Note: This is ignoring a significant number of people who were born after 1927 and who died before 2006. But it is including everyone born from 1990 onwards, essentially zero of whom would have read Proust by 2006.

In my original post I estimated that one person in 10K actually finishes the whole book. So that’s 4.4B/10K = 440K people who read the book during the 79 years.

79 years is 28,835 days, or 692,040 hours. Doing the division, 692,040 / 440,000 = 1.57 hours.

I.e., by the above rough reasoning, someone, somewhere on earth, finishes Proust every 1.57 hours, on average.

I find the closeness of the two estimates quite remarkable. There’s only one shared assumption (1 in 10,000 finishes). Both estimates are quite crude, yet there’s only about a 30% difference in the answers. I was expecting them to be much more divergent.


bzr viz is so pretty

21:32 January 18th, 2010 by terry. Posted under programming, tech. | Comments Off on bzr viz is so pretty

A visual summary of my coding work in the last week, creating branches, working on them, merging them back into the FluidDB trunk.

bzr viz


At what point does an Amazon EC2 reserved instance become worth it?

14:32 January 8th, 2010 by terry. Posted under companies, tech. | 2 Comments »

If you purchase an Amazon EC2 reserved instance, you’ll pay a certain amount up front (pricing). If you don’t use the instance much, it will be more expensive per hour than a regular on-demand instance. E.g., if you paid $227.50 to reserve a small instance for a year but then only used it for a single day, you’d be paying almost $10/hr and it would obviously be much cheaper to just get an on-demand instance and pay just 8.5 cents per hour.

OTOH, if you ran a small instance for a year at the on-demand price, you’d pay $745 and it would obviously be cheaper to pay the up-front reservation price ($227.50) plus a year of the low per-hour pricing (365 * 24 * $0.03), or $490.

So for how long do you have to run an instance in order for it to be cheaper to pay for a reserved instance? (Note that I’m ignoring the time value of money, what you might do with the up-front money in the meantime if you didn’t give it to Amazon in advance, etc.)

The answer is pretty simple: for a one-year reservation you need to run the instance for about 6 months to make it worthwhile. For a three-year reservation you need to run the instance for at least 3 months per year, on average.

Here’s a fragment from a simple spreadsheet I made, based on the US N. Virginia prices:

ec2


Twisted code for retrying function calls

10:53 November 12th, 2009 by terry. Posted under deferreds, python, twisted. | 4 Comments »

These days I often find myself writing code to talk to services that are periodically briefly unavailable. An error of some kind occurs and the correct (and documented) action to take is just to retry the original call a little later. Examples include using Amazon’s S3 service and the Twitter API. In both of these services, transient failures happen fairly frequently.

So I wrote the Twisted class below to retry calls, and tried to make it fairly general. I’d be happy to hear comments on it, because it’s pretty simple and if it can be made bullet proof I imagine others will use it too.

In case you’re not familiar with Twisted and it’s not clear, the call retrying in the below is scheduled by the Twisted reactor. This all asynchronous event-based code that will not block (assuming the function you pass in also does not).

First off, here’s the class that handles the calling:

from twisted.internet import reactor, defer, task
from twisted.python import log, failure

class RetryingCall(object):
    """Calls a function repeatedly, passing it args and kw args. Failures
    are passed to a user-supplied failure testing function. If the failure
    is ignored, the function is called again after a delay whose duration
    is obtained from a user-supplied iterator. The start method (below)
    returns a deferred that fires with the eventual non-error result of
    calling the supplied function, or fires its errback if no successful
    result can be obtained before the delay backoff iterator raises
    StopIteration.
    """
    def __init__(self, f, *args, **kw):
        self._f = f
        self._args = args
        self._kw = kw
        
    def _err(self, fail):
        if self.failure is None:
            self.failure = fail
        try:
            fail = self._failureTester(fail)
        except:
            self._deferred.errback()
        else:
            if isinstance(fail, failure.Failure):
                self._deferred.errback(fail)
            else:
                log.msg('RetryingCall: Ignoring %r' % (fail,))
                self._call()

    def _call(self):
        try:
            delay = self._backoffIterator.next()
        except StopIteration:
            log.msg('StopIteration in RetryingCall: ran out of attempts.')
            self._deferred.errback(self.failure)
        else:
            d = task.deferLater(reactor, delay,
                                self._f, *self._args, **self._kw)
            d.addCallbacks(self._deferred.callback, self._err)

    def start(self, backoffIterator=None, failureTester=None):
        self._backoffIterator = iter(backoffIterator or simpleBackoffIterator())
        self._failureTester = failureTester or (lambda _: None)
        self._deferred = defer.Deferred()
        self.failure = None
        self._call()
        return self._deferred

You call the constructor with your function and the args it should be called with. Then you call start() to get back a deferred that will eventually fire with the result of the call, or an error. BTW, I called it “start” to mirror twisted.internet.task.LoopingCall.

There’s a helper function for producing successive inter-call delays:

from operator import mul
from functools import partial

def simpleBackoffIterator(maxResults=10, maxDelay=120.0, now=True,
                          initDelay=0.01, incFunc=None):
    assert maxResults > 0
    remaining = maxResults
    delay = initDelay
    incFunc = incFunc or partial(mul, 2.0)
    
    if now:
        yield 0.0
        remaining -= 1
        
    while remaining > 0:
        yield (delay if delay < maxDelay else maxDelay)
        delay = incFunc(delay)
        remaining -= 1

By default this will generate the sequence of inter-call delays 0.0, 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56 and it should be easy to see how you could write your own. Or you can just supply a list, etc. When the backoff iterator finishes, the RetryingCall class gives up on trying to get a non-error result from the function. In that case errback is called on the deferred that start() returns, with the failure from the first call.

You get to specify a function for testing failures. If it ever raises or returns a failure, the start() deferred's errback is called. The failure tester can just ignore whatever failures should be considered transient.

So, for example, if you were calling S3 and wanted to ignore 504 errors, you could supply a failureTester arg like this:

    from twisted.web import error, http

    def test(self, failure):
        failure.trap(error.Error)
        if int(failure.value.status) != http.GATEWAY_TIMEOUT:
            return failure

As another example, while using the Twitter API you might want to allow a range of HTTP errors and also exactly one 404 error, seeing as a 404 might be an error on the part of Twitter (I don't mean to suggest that actually happens). It's probably definitive - but, why not try it once again just to be more sure? So, pass RetryingCall a failureTester that's an instance of a class like this:

class TwitterFailureTester(object):
    okErrs = (http.INTERNAL_SERVER_ERROR,
              http.BAD_GATEWAY,
              http.SERVICE_UNAVAILABLE)

    def __init__(self):
        self.seen404 = False

    def __call__(self, failure):
        failure.trap(error.Error)
        status = int(failure.value.status)
        if status == http.NOT_FOUND:
            if self.seen404:
                return failure
            else:
                self.seen404 = True
        elif status not in self.okErrs:
            return failure

Changing existing code to use RetryingCall is pretty trivial. Take something like this

from twisted.web import client

def getUserByScreenname(screenname):
    d = client.getPage(
        'http://twitter.com/users/show.json?screen_name=glyf')
    return d

and change it to look like this:

def getUserByScreenname(screenname):
    r = RetryingCall(client.getPage,
        'http://twitter.com/users/show.json?screen_name=glyf')
    d = r.start(failureTester=TwitterFailureTester())
    return d

I wrote this about 10 days ago and posted it to the Twisted mailing list. No-one replied to say how horrible the code is or that it shoud be done another way, which is a pretty good sign. The above includes an improvement suggested by Tim Allen, and is slightly more useful than the code I posted originally (see the thread on the Twisted list for details).

All code above is available to you under CC0 1.0 Universal - Public Domain Dedication.


Fault-tolerant Python Twisted classes for getting all Twitter friends or followers

00:56 October 22nd, 2009 by terry. Posted under python, twisted, twitter. | Comments Off on Fault-tolerant Python Twisted classes for getting all Twitter friends or followers

It’s been forever since I blogged here. I just wrote a little Python to grab all of a user’s friends or followers (or just their user ids). It uses Twisted, of course. There were two main reasons for doing this: 1) I want all friends/followers, not just the first bunch returned by the Twitter API, and 2) I wanted code that is fairly robust in the face of various 50x HTTP errors (I regularly experience INTERNAL_SERVER_ERROR, BAD_GATEWAY, and SERVICE_UNAVAILABLE).

If you want to use the code below and you’re not familiar with the Twitter API, consider whether you can use the FriendsIdFetcher and FollowersIdFetcher classes as they’ll do far fewer requests (you get 5000 results per API call, instead of 100). If you can live with user ids and do the occasional fetch of a full user, you’ll probably do far fewer API calls.

For the FriendsFetcher and FollowersFetcher classes, you get back a list of dictionaries, one per user. For FriendsIdFetcher and FollowersIdFetcher you get a list of Twitter user ids.

Of course there’s no documentation. Feel free to ask questions in the comments. Download the source.

import sys

from twisted.internet import defer
from twisted.web import client, error, http
    
if sys.hexversion >= 0x20600f0:
    import json
else:
    import simplejson as json

class _Fetcher(object):
    baseURL = 'http://twitter.com/'
    URITemplate = None # Override in subclass.
    dataKey = None # Override in subclass.
    maxErrs = 10
    okErrs = (http.INTERNAL_SERVER_ERROR,
              http.BAD_GATEWAY,
              http.SERVICE_UNAVAILABLE)
    
    def __init__(self, name):
        assert self.baseURL.endswith('/')
        self.results = []
        self.errCount = 0
        self.nextCursor = -1
        self.deferred = defer.Deferred()
        self.URL = self.baseURL + (self.URITemplate % { 'name' : name })

    def _fail(self, failure):
        failure.trap(error.Error)
        self.errCount += 1
        if (self.errCount < self.maxErrs and
            int(failure.value.status) in self.okErrs):
            self.fetch()
        else:
            self.deferred.errback(failure)
        
    def _parse(self, result):
        try:
            data = json.loads(result)
            self.nextCursor = data.get('next_cursor')
            self.results.extend(data[self.dataKey])
        except Exception:
            self.deferred.errback()
        else:
            self.fetch()
            
    def _deDup(self):
        raise NotImplementedError('Override _deDup in subclasses.')

    def fetch(self):
        if self.nextCursor:
            d = client.getPage(self.URL + '?cursor=%s' % self.nextCursor)
            d.addCallback(self._parse)
            d.addErrback(self._fail)
        else:
            self.deferred.callback(self._deDup())
        return self.deferred

class _FriendsOrFollowersFetcher(_Fetcher):
    dataKey = u'users'
    
    def _deDup(self):
        seen = set()
        result = []
        for userdict in self.results:
            uid = userdict['id']
            if uid not in seen:
                result.append(userdict)
                seen.add(uid)
        return result

class _IdFetcher(_Fetcher):
    dataKey = u'ids'
    
    def _deDup(self):
        # Keep the ids in the order we received them.
        seen = set()
        result = []
        for uid in self.results:
            if uid not in seen:
                result.append(uid)
                seen.add(uid)
        return result

class FriendsFetcher(_FriendsOrFollowersFetcher):
    URITemplate = 'statuses/friends/%(name)s.json'

class FollowersFetcher(_FriendsOrFollowersFetcher):
    URITemplate = 'statuses/followers/%(name)s.json'

class FriendsIdFetcher(_IdFetcher):
    URITemplate = 'friends/ids/%(name)s.json'

class FollowersIdFetcher(_IdFetcher):
    URITemplate = 'followers/ids/%(name)s.json'

Usage is dead simple:

fetcher = FriendsFetcher('terrycojones')
d = fetcher.fetch()
d.addCallback(....) # etc.

Enjoy.


Crowdsourcing Arabic-to-English translation in the Geneva airport

15:11 October 10th, 2009 by terry. Posted under books, me. | Comments Off on Crowdsourcing Arabic-to-English translation in the Geneva airport

Today I met an extraordinary Iranian man in the Geneva airport. He’s written a 1000 page book in Arabic about (at least in part) his experiences in Cyprus. He approached me, asked if my English was really really good, sat next to me, and started pulling out several pages of hand-wrtten uppercase English. He had me go over them, improve them, write some new text as he read his Arabic in halting English, told me exactly how he wanted it to sound, pressed me to find shorter ways to say things, and finally got me to write out (for his next helper, no doubt) a clean copy of all my work. He had me go look up a recent paper dating the evolutionary split between humans & chimpanzees and to confirm that it didn’t contradict his text (another fragment thrust importunately into my hands). He was about 75. We spent 90 mins together, smiling and congratulating each other over a few sentences that turned out particularly well. Told me he’s going to have it published by Oxford – that’s his aim anyway.

I thought to myself that we each have our own mountain to climb – or at least those who have a taste for years-long patient endeavors, but how different his from mine. We parted and he went off to approach another stranger. He’ll get the whole book done a few pages a day in the Geneva airport, I’ve no doubt. “It’s the perfect place” he told me. Amazing, extraordinary, humbling, etc…


Facebook release Tornado and it’s not based on Twisted?

14:25 September 12th, 2009 by terry. Posted under FluidDB, python, tech, twisted. | 17 Comments »

Image: Jay Smith

Image: Jay Smith

To their great credit, Facebook have just open-sourced more of their core software. This time it’s Tornado, an asynchronous web server written in Python.

Surely that can only mean one thing: Tornado is based on Twisted. Right?

Incredibly, no. Words fail me on this one. I’ve spent some hours today trying to put my thoughts into order so I could put together a reasonably coherent blog post on the subject. But I’ve failed. So here are some unstructured thoughts.

First of all, I’m not meaning to bash Facebook on this. At Fluidinfo we use their Thrift code. We’ll almost certainly use Scribe for logging at some point, and we’re seriously considering using Cassandra. Esteve Fernandez has put a ton of work into txAMQP to glue together Thrift, Twisted, and AMQP, and in the process became a Thrift committer.

Second, you can understand—or make an educated guess at—what happened: the Facebook programmers, like programmers everywhere, were strongly tempted to just write their own code instead of dealing with someone else’s. It’s not just about learning curves and fixing deficiencies, there are also issues of speed of changes and of control. At Fluidinfo we suffered through six months of maintaining our own set of related patches to Thrift before the Twisted support Esteve wrote was finally merged to trunk. That was painful and the temptation to scratch our own itch, fork, and forget about the official Thrift project was high.

Plus, Twisted suffers from the fact that the documentation is not as good as it could be. I commented at length on this over three years ago. Please read the follow-up posts in that thread for an illustration (one of many) of the maturity of the people running Twisted. Also note that the documentation has improved greatly since then. Nevertheless, Twisted is a huge project, it has tons of parts, and it’s difficult to document and to wrap your head around no matter what.

So you can understand why Facebook might have decided not to use Twisted. In their words:

We ended up writing our own web server and framework after looking at existing servers and tools like Twisted because none matched both our performance requirements and our ease-of-use requirements.

I’m betting it’s that last part that’s the key to the decision not to use Twisted.

But seriously…… WTF?

Twisted is an amazing piece of work, written by some truly brilliant coders, with huge experience doing exactly what Facebook set out to reinvent.

This is where I’m at a loss for words. I think: “what an historic missed opportunity” and “reinventing the wheel, badly” and “no, no, no, this cannot be” and “this is just so short-sighted” and “a crying shame” and many things besides.

Sure, Twisted is difficult to grok. But that’s no excuse. It’s a seriously cool and powerful framework, it’s vastly more sophisticated and useful and extensible than what Facebook have cobbled together. Facebook could have worked to improve twisted.web (which everyone agrees has some shortcomings) which could have benefitted greatly from even a small fraction of the resources Facebook must have put into Tornado. The end result would have been much better. Or Facebook could have just ignored twisted.web and built directly on top of the Twisted core. That would have been great too.

Or Facebook could have had a team of people who knew how to do it better, and produced something better than Twisted. I guess that’s the real frustration here – they’ve put a ton of effort into building something much more limited in scope and vision, and even the piece that they did build looks like a total hack built to scratch their short term needs.

What’s the biggest change in software engineering over the last decade? Arguably it’s the rise of test-driven development. I’m not the only one who thinks so. Yet here we are in late 2009 and Facebook have released a major piece of code with no test suite. Amazing. OK, you could argue this is a minor thing, that it’s not core to Tornado. That argument has some weight, but it’s hard to think that this thing is not a hack.

If you decide to use an asynchronous web framework, do you expect to have to manually set your sockets to be non-blocking? Do you feel like catching EWOULDBLOCK and EAGAIN yourself? Those sorts of things, and their non-portability (even within the world of Linux) are precisely the kinds of things that lead people away from writing C and towards doing rapid development using something robust and portable that looks after the details. They’re precisely the things Twisted takes care of for you, and which (at least in Twisted) work across platforms, including Windows.

It looks like Tornado are using a global reactor, which the Twisted folks have learned the hard way is not the best solution.

Those are just some of the complaints I’ve heard and seen in the Tornado code. I confess I’ve looked only superficially at their code – but more than enough to feel such a sense of lost opportunity. They built a small subsection of Twisted, they’ve done it with much less experience and elegance and hiding of detail than the Twisted code, and the thing doesn’t even come with a test suite. Who knows if it actually works, or when, or where, etc.?

And…. Twisted is so much more. HTTP is just one of many protocols Twisted speaks, including (from their home page): “TCP, UDP, SSL/TLS, multicast, Unix sockets, a large number of protocols including HTTP, NNTP, IMAP, SSH, IRC, FTP, and others”.

Want to build a sophisticated, powerful, and flexible asynchronous internet service in Python? Use Twisted.

A beautiful thing about Twisted is that it expands your mind. Its abstractions (particularly the clean separation and generality of transports, protocols, factories, services, and Deferreds—see here and here and here) makes you a better programmer. As I wrote to some friends in April 2006: “Reading the Twisted docs makes me feel like my brain is growing new muscles.”

Twisted’s deferreds are extraordinarily elegant and powerful, I’ve blogged and posted to the Twisted mailing list about them on multiple occasions. Learning to think in the Twisted way has been a programming joy to me over the last 3 years, so you can perhaps imagine my dismay that a company with the resources of Facebook couldn’t be bothered to get behind it and had to go reinvent the wheel, and do it so poorly. What a pity.

In my case, I threw away an entire year of C code in order to use Twisted in FluidDB. That was a decision I didn’t take lightly. I’d written my own libraries to do lots of low level network communications and RPC – including auto-generating server and client side glue libraries to serialize and unserialize RPC calls and args (a bit like Thrift), plus a server and tons of other stuff. I chucked it because it was too brittle. It was too much of a hack. It wasn’t portable enough. It was too get the details right. It wasn’t extensible.

In other words….. it was too much like Tornado! So I threw it all away in favor of Twisted. As I happily tell people, FluidDB is written in Python so it can use Twisted. It was a question of an amazingly great asynchronous networking framework determining the choice of programming language. And this was done in spite of the fact that I thought the Twisted docs sucked badly. The people behind Twisted were clearly brilliant and the community was great. There was an opportunity to make a bet on something and to contribute. I wish Facebook had made the same decision. It’s everyone’s loss that they did not. What a great pity.


FluidDB has launched!

11:37 August 25th, 2009 by terry. Posted under me. | 2 Comments »

In case you missed it, FluidDB has (finally) launched. I wont be blogging here about FluidDB or Fluidinfo, though will continue to post personal things and of course random bits of code that seem interesting (and small) enough to warrant mention. I have yet another Twisted snippet coming up, though I’m not sure when I’ll get there.

We’re all exhausted and thrilled to have FluidDB out the door. I wont try to describe the feelings, except to say that it’s all incredibly exciting, and that I haven’t been getting much sleep recently. The reaction in the programmer community has been astounding: there are 9 client-side libraries already written (with more on the way), there are tools, there’s a FluidDB Explorer, and little apps are now starting to pop up. We couldn’t be happier. You can see a list of those things here.

To find out more about FluidDB, here are your best choices:

Thanks for reading along! The real journey is probably only just beginning…


Python code for retrieving all your tweets

21:44 June 24th, 2009 by terry. Posted under python, twitter. | 13 Comments »

Here’s a little Python code to pull back all a user’s Twitter tweets. Make sure you read the notes at bottom in case you want to use it.

import sys, twitter, operator
from dateutil.parser import parse

twitterURL = 'http://twitter.com'

def fetch(user):
    data = {}
    api = twitter.Api()
    max_id = None
    total = 0
    while True:
        statuses = api.GetUserTimeline(user, count=200, max_id=max_id)
        newCount = ignCount = 0
        for s in statuses:
            if s.id in data:
                ignCount += 1
            else:
                data[s.id] = s
                newCount += 1
        total += newCount
        print >>sys.stderr, "Fetched %d/%d/%d new/old/total." % (
            newCount, ignCount, total)
        if newCount == 0:
            break
        max_id = min([s.id for s in statuses]) - 1
    return data.values()

def htmlPrint(user, tweets):
    for t in tweets:
        t.pdate = parse(t.created_at)
    key = operator.attrgetter('pdate')
    tweets = sorted(tweets, key=key)
    f = open('%s.html' % user, 'wb')
    print >>f, """Tweets for %s
    
    """ % user
    for i, t in enumerate(tweets):
        print >>f, '%d. %s %s
' % ( i, t.pdate.strftime('%Y-%m-%d %H:%M'), twitterURL, user, t.id, t.text.encode('utf8')) print >>f, '
' f.close() if __name__ == '__main__': user = 'terrycojones' if len(sys.argv) < 2 else sys.argv[1] data = fetch(user) htmlPrint(user, data)

Notes:

Fetch all of a user's tweets and write them to a file username.html (where username is given on the command line).

Output is to a file instead of to stdout as tweet texts are unicode and sys.stdout.encoding is ascii on my machine, which prevents printing non-ASCII chars.

This code uses the Python-Twitter library. You need to get (via SVN) the very latest version, and then you need to fix a tiny bug, described here. Or wait a while and the SVN trunk will be patched.

This worked flawlessly for my 2,300 tweets, but only retrieved about half the tweets of someone who had over 7,000. I'm not sure what happened there.

There are tons of things that could be done to make the output more attractive and useful. And yes, for nitpickers, the code has a couple of slight inefficiencies :-)


Paella

03:47 June 14th, 2009 by terry. Posted under other. | Comments Off on Paella

671505-paella-3


A middling tower

07:16 June 9th, 2009 by terry. Posted under Fluidinfo. | Comments Off on A middling tower

648993-eiffel


2 cents

17:03 June 5th, 2009 by terry. Posted under companies, Fluidinfo, me. | Comments Off on 2 cents

My bank account hits rock bottom, at 2 cents, while building Fluidinfo.

634761-bbva-highlighted


Full tilt at the center of the earth

17:06 May 31st, 2009 by terry. Posted under books, Faulkner. | Comments Off on Full tilt at the center of the earth

It was cold that morning, the first winter cold-snap; the hedgerows were rimed and stiff with frost and the standing water in the roadside drainage ditches was skimmed with ice and even the edges of the running water in the Nine Mile branch glinted fragile and scintillant like fairy glass and from the first farmyard they passed and then again and again and again came the windless tang of woodsmoke and they could see in the back yards the black iron pots already steaming while women in the sunbonnets still of summer or men’s old felt hats and long men’s overcoats stoked wood under them and the men with crokersack aprons tied with wire over their overalls whetted knives or already moved about the pens where hogs grunted and squealed, not quite startled, not alarmed but just alerted as though sensing already even though only dimly their rich and immanent destiny; by nightfall the whole land would be hung with their spectral intact tallowcolored empty carcasses immobilised by the heels in attitudes of frantic running as though full tilt at the center of the earth.

From William Faulkner’s Intruder in the Dust.


Slides from FluidDB talk at PGCon

17:10 May 26th, 2009 by terry. Posted under me. | Comments Off on Slides from FluidDB talk at PGCon

Here are the slides from my talk on May 22, 2009 at the Postgres Conference (PGCon) in Ottawa. The video will be available soon.


Talking at Postgres Conference (PGCon) in Ottawa

14:23 May 19th, 2009 by terry. Posted under me. | Comments Off on Talking at Postgres Conference (PGCon) in Ottawa

Here’s just a quick note to mention that I’m talking at the annual Postgres Conference aka PGCon. The talk is titled The design, architecture, and tradeoffs of FluidDB, and is at 3pm on May 22nd. So if you happen to be in Ottawa this week…

I could have added the subtitle “How someone who knows nothing about databases wound up in a project to build a database.”


A mixin class allowing Python __init__ methods to work with Twisted deferreds

17:54 May 11th, 2009 by terry. Posted under deferreds, python, twisted. | Comments Off on A mixin class allowing Python __init__ methods to work with Twisted deferreds

I posted to the Python Twisted list back in Nov 2008 with subject: A Python metaclass for Twisted allowing __init__ to return a Deferred

Briefly, I was trying to find a nice way to allow the __init__ method of a class to work with deferreds in such a way that methods of the class could use work done by __init__ safe in the knowledge that the deferreds had completed. E.g., if you have

class X(object):
    def __init__(self, host, port):
        def final(connection):
            self.db = connection
        d = makeDBConnection(host, port)
        d.addCallback(final)

    def query(self, q):
        return self.db.runQuery(q)

Then when you make an X and call query on it, there’s a chance the deferred wont have fired, and you’ll get an error. This is just a very simple illustrative example. There are many more, and this is a general problem of the synchronous world (in which __init__ is supposed to prepare a fully-fledged class instance and cannot return a deferred) meeting the asynchronous world in which, as Twisted programmers, we would like to (and must) use deferreds.

The earlier thread, with lots of useful followups can be read here. Although I learned a lot in that thread, I wasn’t completely happy with any of the solutions. Some of the things that still bugged me are in posts towards the end of the thread (here and here).

The various approaches we took back then all boiled down to waiting for a deferred to fire before the class instance was fully ready to use. When that happened, you had your instance and could call its methods.

I had also thought about an alternate approach: having __init__ add a callback to the deferreds it dealt with to set a flag in self and then have all dependent methods check that flag to see if the class instance was ready for use. But that 1) is ugly (too much extra code); 2) means the caller has to be prepared to deal with errors due to the class instance not being ready, and 3) adds a check to every method call. It would look something like this:

class X(object):
    def __init__(self, host, port):
        self.ready = False
        def final(connection):
            self.db = connection
            self.ready = True
        d = makeDBConnection(host, port)
        d.addCallback(final)

    def query(self, q):
        if not self.ready:
            raise IAmNotReadyException()
        return self.db.runQuery(q)

That was too ugly for my taste, for all of the above reasons, most especially for forcing the unfortunate caller of my code to handle IAmNotReadyException.

Anyway…. fast forward 6 months and I’ve hit the same problem again. It’s with existing code, in which I would like an __init__ to call something that (now, due to changes elsewhere) returns a deferred. So I started thinking again, and came up with a much cleaner way to do the alternate approach via a class mixin:

from twisted.internet import defer

class deferredInitMixin(object):
    def wrap(self, d, *wrappedMethods):
        self.waiting = []
        self.stored = {}

        def restore(_):
            for method in self.stored:
                setattr(self, method, self.stored[method])
            for d in self.waiting:
                d.callback(None)

        def makeWrapper(method):
            def wrapper(*args, **kw):
                d = defer.Deferred()
                d.addCallback(lambda _: self.stored[method](*args, **kw))
                self.waiting.append(d)
                return d
            return wrapper

        for method in wrappedMethods:
            self.stored[method] = getattr(self, method)
            setattr(self, method, makeWrapper(method))

        d.addCallback(restore)

You use it as in the class Test below:

from twisted.internet import defer, reactor

def fire(d, value):
    print "I finally fired, with value", value
    d.callback(value)

def late(value):
    d = defer.Deferred()
    reactor.callLater(1, fire, d, value)
    return d

def called(result, what):
    print 'final callback of %s, result = %s' % (what, result)

def stop(_):
    reactor.stop()


class Test(deferredInitMixin):
    def __init__(self):
        d = late('Test')
        deferredInitMixin.wrap(self, d, 'f1', 'f2')

    def f1(self, arg):
        print "f1 called with", arg
        return late(arg)

    def f2(self, arg):
        print "f2 called with", arg
        return late(arg)


if __name__ == '__main__':
    t = Test()
    d1 = t.f1(44)
    d1.addCallback(called, 'f1')
    d2 = t.f1(33)
    d2.addCallback(called, 'f1')
    d3 = t.f2(11)
    d3.addCallback(called, 'f2')
    d = defer.DeferredList([d1, d2, d3])
    d.addBoth(stop)
    reactor.run()

Effectively, the __init__ of my Test class asks deferredInitMixin to temporarily wrap some of its methods. deferredInitMixin stores the original methods away and replaces each of them with a function that immediately returns a deferred. So after __init__ finishes, code that calls the now-wrapped methods of the class instance before the deferred has fired will get a deferred back as usual (but see * below). As far as they know, everything is normal. Behind the scenes, deferredInitMixin has arranged for these deferreds to fire only after the deferred passed from __init__ has fired. Once that happens, deferredInitMixin also restores the original functions to the instance. As a result there is no overhead later to check a flag to see if the instance is ready to use. If the deferred from __init__ happens to fire before any of the instance’s methods are called, it will simply restore the original methods. Finally (obviously?) you only pass the method names to deferredInitMixin that depend on the deferred in __init__ being done.

BTW, calling the methods passed to deferredInitMixin “wrapped” isn’t really accurate. They’re just temporarily replaced.

I quite like this approach. It’s a second example of something I posted about here, in which a pool of deferreds is accumulated and all fired when another deferred fires. It’s nice because you don’t reply with an error and there’s no need for locking or other form of coordination – the work you need done is already in progress, so you get back a fresh deferred and everything goes swimmingly.

* Minor note: the methods you wrap should probably be ones that already return deferreds. That way you always get a deferred back from them, whether they’re temporarily wrapped or not. The above mixin works just fine if you ask it to wrap non-deferred-returning functions, but you have to deal with the possibility that they will return a deferred (i.e., if you call them while they’re wrapped).


Balcony music

10:54 May 6th, 2009 by terry. Posted under music. | 41 Comments »

Today I discovered the wonderful Grooveshark and some thoughts occurred to me that I feel like writing down.

I haven’t spent much time thinking about rights over digital media, downloading, etc. I’ve tended to ignore the whole debate. So the following may all be commonplace observations. I have no idea.

It occurred to me that continued increases in the prevalence and bandwidth of internet access might be going to solve a problem they helped create. That we may simply be in a temporary uncomfortable phase that will soon be over.

The increase of broadband made it possible for people to download large music and video files. People had long been used to the traditional model of owning physical objects that contained their music and video: LPs, 8-track, cassettes, VHS cartridges, CDs, DVDs, etc. It was all physical property. We typically paid for it. I still have about 1,000 CDs, all paid for, sitting uselessly on my shelf.

The default frame of reference was the physical object that you bought in a store, brought home, physically put in a player, physically stored on a shelf, could lend to (and hopefully get back from) a friend. Broadband extended that, allowing us to download what we still thought of as physical objects. And they are physical objects in a real sense: occupying space on our digital hard disk shelves, needing organizational love and care, needing backups, etc.

Because the frame of reference was still physical objects, the media companies, who have their own opinion on the various rights – real and imagined – associated with these objects, had a way to go after the downloaders. They could point to the physical objects and say “hey, you stole that (object)”, or “you didn’t pay for that (object)”. They could even write worms and rootkits to dig into our computers looking for the objects, getting lists of them to hold up in court. And they had a point: where did you get that physical object after all?

But their argument, the frame of reference that shapes the debate, rests on ancient arguments: agreements and conventions regarding physical objects. Much of the law is based on these things.

The frame of reference might be due to change radically, kicking the legs out from under the music industry.

Imagine you’re walking down the street. You pass under a balcony and see open doors leading back into an apartment. There’s great music coming out of the doors, and you can hear it clearly down in the street. You stop to listen. Have you committed a crime? Would anyone even suggest that you had?

Someone comes out onto the balcony to stand in the sun. You call up and ask what the music is. They tell you, and you say how much you like it. They tell you they have other albums – and would you like to hear another song? You say yes, and stand down in the street while they put on another track. No crime there, right?

Suppose this balcony is in the building right next to yours. You go home and open your own balcony doors to be able to hear the music. You do that every day. Once in a while you bump into the neighbor in the street and comment on something else, maybe make a request. In the end the neighbor even suggests running a speaker wire into your apartment so you can hear their music whenever you like, even if it’s raining and everyone has their balcony doors closed. You buy a speaker with a volume control on it. Once in a while you even call your neighbor on the phone to ask them to play something again, or to put on a special track.

There’s no crime there, not even the hint of one. The media companies would probably like to protest. But the frame of reference has totally changed. We’ve gone from the mindset of physical possession of an object of questionable origin to the walking down the street and hearing music.

And so it will go with increasing broadband. I’ve been listening to Clem Snide all day on Grooveshark. It’s streaming into my computer and directly to my speakers without being stored as a physical object on my machine. Entire tracks are not being physically stored: the music coming out my speakers and the data on my machine are just as ephemeral as they would be if I were walking down the street overhearing Clem Snide from someone else’s balcony.

Have I broken a crime? I find it very hard to argue that I have. OTOH, if I download a file and store it on my machine (which I have done many times BTW) it’s very easy to argue that there is a crime of some sort being committed. It’s easy to ignore that feeling too, but that’s not the point.

The reality is, I think, that we don’t actually want to own the physical objects. I don’t want a shelf full of physical CDs, and I don’t want a hard drive that’s 80% full of music files that I worry about and even back up.

How many times do you watch a DVD anyway? For many people it’s silly to buy a DVD because you can rent it much more cheaply, and you’re probably only going to watch it once or maybe twice. Music, for me at least, is different as I’ll sometimes listen to a single track 100-200 times. But I still don’t need or want to own it if I can just pull it up on demand via Grooveshark. I’d rather it was their disk space than mine, and the bandwidth interference with my normal work due to the streaming audio is increasingly hard to detect.

We may just be in a temporary uncomfortable stage that will be solved by the thing that got us here – increasing broadband access.

As bandwidth increases and becomes cheaper it seems like there will be a trend towards just streaming media and not downloading it to have and to hold until the RIAA or MPAA do us part.

At that point the frame of reference will change. It will become very difficult to maintain that a crime has been committed. To do so you’ll have to also argue that walking down the street and overhearing your neighbor’s music is also a crime. Good luck making that argument.


The first loud ding-dong of time and doom

10:05 May 5th, 2009 by terry. Posted under books, Faulkner. | Comments Off on The first loud ding-dong of time and doom

But not for a little while yet; for a little while yet the sparrows and the pigeons; garrulous myriad and independent the one, the other uxorious and interminable, at once frantic and tranquil – until the clock strikes again which even after a hundred years, they still seem unable to get used to, bursting in one swirling explosion out of the belfry as though the hour, instead of merely adding one puny infinitesimal more to the long weary increment since Genesis, had shattered the virgin pristine air with the first loud ding-dong of time and doom.

The final paragraph of Act 1, (The Courthouse) in Faulkner’s Requiem for a Nun.


OK, it’s a pandemic. Now what?

01:51 April 30th, 2009 by terry. Posted under me. | 35 Comments »

Here are some more thoughts on the (now official) influenza pandemic.

I would like again to emphasize that I’m not an authority and I’m not trying to pass myself off as one.

I’ve already been accused of deliberate fear-mongering. That’s the opposite of my purpose. On the contrary, it’s important to stay calm and there are good reasons for doing so. If you don’t want to know a bit of history and to have some sense of things that have happened in previous pandemics, then you don’t have to read what follows. There’s no harm in staying calm via not knowing. On the other hand, there is harm in being gripped by fear due to ignorance.

If you do read, try to keep in mind that the main point here is that you shouldn’t be overwhelmed by fear or begin to panic. There’s no reason to. Plus it will only make things worse.

On the subject of being an authority and fear-mongering, after I wrote up the first set of thoughts I was invited to be a guest commentator on a radio show. I declined. It feels very irresponsible to say anything about influenza given that I am not an expert and don’t even work in the field anymore, but OTOH it feels irresponsible to remain totally silent given that I know at least some historical things fairly well.

To illustrate the conflict: On April 26 it seemed crystal clear to me that the virus was going worldwide. You only had to have seen the Google map that I twittered about the day before to see that it was going to be all over the place in days. But I didn’t want to point that out, and when I was asked I told the asker to be his own judge. I linked to a map showing Mexico and the US earlier and said “hopefully we wont have to zoom out” – trying to get people to consider that we would probably soon have to zoom out.

So I think I’ve been quite restrained. This post is also restrained. As I said above, there are good reasons not to sensationalize things or to create the impression that people should panic.

So here we go again, a few more thoughts as they come to mind. These are things that I find interesting, with a few scattered opinions (all of which are just guesses). There’s no real structure to this post.

The WHO have announced today that we’re officially in a pandemic. That doesn’t really mean much, but it’s good to have a candid and early declaration – part of the problem historically has been slowness to even admit there’s a problem. The WHO didn’t even exist in 1918.

In case you don’t know, there’s pretty good evidence that humans have been fighting influenza for thousands of years.

The most interesting thing to me in reading about the 1918 pandemic is the social impact of the disease.

One thing to make clear is that the current pandemic is not the 1918 pandemic. I tend to agree with those who say that a pandemic of that nature could not take place today – but note that people, perhaps especially scientists – would have said that at all times prior to and after 1918. We often under-estimate the forces of nature and over-estimate our own knowledge and level of control.

BTW, something like 75% of people who died during World War 1 did so because of the flu pandemic, which didn’t really take off until November of 1918. Amazing.

As I mentioned in my earlier post, under normal circumstances (even in a pandemic), flu doesn’t kill you. It leaves you susceptible to opportunistic follow-on disease. The good news is that we are vastly more informed now than we were in 1918 about the nature of infectious diseases. For example, we know a lot about pneumonia, which we did not in 1918. See the moving story of the amazing Oswald Avery, who dedicated his life to the disease and along the way fingered DNA as the vehicle of genetic inheritance – and never won a Nobel prize.

So the care of people who have been struck down by flu is going to be much more informed this time around. And it will probably be better in practice too. I put it that way because odd societal things happen in a pandemic. I hesitate to go into detail, because some people will assume that things that happened way back when will necessarily happen again this time around.

One of those things is that medical systems get overrun by the sick. Plus, doctors and nurses understandably decide that their jobs have become too dangerous and they stop showing up for work. So there can be a sharp drop-off in the availability of medical help.

So much so that there were reports of doctors and nurses being held hostage in houses in 1918. I.e., if you could get a doctor to visit to attend to your family, the situation was so dire you might consider pulling a gun on him/her and suggesting they make themselves comfortable for the duration.

The problem is not so much that many people are dying, it’s that a much larger number are simultaneously extremely ill and that panic grips them and everyone else. Roughly 30% of all people caught the 1918 flu. I have another post I may write up on that. Normal (epidemic) flu catches 10-15% of people in any give year.

Many of our systems are engineered to provide just-in-time resources, to cut the fat in order to maximize profitability, etc. That means that we’re closer to collapse that would seem apparent. How many days of fresh food are there on hand in a major city?

None of this is meant to be alarmist. But the reality is that alarming things have happened in the past.

Most interesting and revealing to me is that our cherished notions of politeness, of our generosity, our goodwill towards our neighbors, etc., can all go out the window pretty quickly. I’ve long held that all those things are the merest veneer on our underlying biological / evolutionary reality. We’re very fond of the ideas that we’re somehow no longer primates, that we’re not really the product of billions of years of evolutionary history, that somehow the last centuries of vaunted rationality have put paid to all those primitive lower impulses. I think that’s completely wrong. Behavior during a full-scale pandemic is one of the things that makes that very clear.

In a pandemic, if things get ugly, you can expect to see all manner of anti-social behavior. If you read John Barry’s book The Great Influenza or Crosby’s America’s Forgotten Pandemic you’ll get some graphic illustrations.

If I had a supply of tamiflu (which I don’t), I wouldn’t tell anyone. That’s deliberately anti-social. Ask yourself: What would you do if you had kids who were still healthy and your neighbor called you in desperation to tell you that his/her kids seemed to have come down with influenza? Get out your family’s tamiflu supply and hand it over? Lie? What if they knew you had it and you refused to give it or share? What if your neighbor’s kid died and yours never even got the flu? What kind of relationship would be left after the pandemic had passed?

This may all seem a bit extreme and deliberately provocative of me, and yet those sorts of dilemmas (sans tamiflu, naturally) were commonplace in 1918. As you might expect, they don’t always get resolved in ways that accord well with our preferred beliefs about our own natures in easier times. Crosby speculates that the reason the pandemic of 1918 is “forgotten” is due largely to the fact that it coincided with the war, and that people were generally exhausted and dispirited and wanting to move on. I’d speculate further that people en masse frequently behaved in ways that they weren’t proud of, and wanted to forget about it and act as if it hadn’t happened ASAP. That’s just a guess, of course.

In any case, if there’s a full-blown pandemic, societal structures that we take for granted are going to be hugely transformed. Medical services, emergency services, food supply, child care and education, job absenteeism, large numbers of the people who would normally be in charge of things coming down sick and being unable to do their normal jobs, etc. All sorts of things are impacted and lots of them are interconnected. The system breaks down in many unanticipated ways as all sorts of things that “could never happen” are all happening at once.

You might think I’m fear-mongering here, but I’m not. In fact I’m refraining from going into detail. Go read John Barry’s book, or any of the others, and see for yourself.

The important thing to remember in all this is that we are no longer in 1918. BTW, there were also influenza pandemics in 1957 (killing just a couple of million people) and 1968 (killing a mere million).

Apart from the fact that we’ve advanced hugely in medical terms, we are also much better connected. I can sit in the safety of my apartment in Barcelona and broadcast calming information like this blog post to thousands of people. We are better informed. We know that panic and fear greatly compound the impact of a pandemic. They feed on one another and prolong the systemic societal collapse. Because we can communicate so easily via the internet – provided our ISPs stay online – we can help keep each other calm. That’s an important advantage.

So my guess is that this wave isn’t going to be so bad, certainly not in terms of mortality. One thing to keep in mind though is that the virus isn’t going away. It will likely enjoy the Southern hemisphere winter, and we’ll see it again next Northern hemisphere winter. And yes, those are guesses. Because influenza is a single-stranded RNA virus it mutates rapidly (you don’t get the copy protection of a double strand). So this is the beginning, not the end, even if the pandemic fizzles out in the short term. It will be back – probably in less virulent form – but by then we’ll also have a good leg up on potential vaccines, and we’ll also know it’s coming.

OK, I’ll stop there for now. I have tons of other things I could write now that I’m warmed up. You can follow me on Twitter if you like, though I doubt I’ll be saying much about influenza.

If you truly believe I’m fear mongering, please send me an email or leave your email address in the comments. I’ll send you mail with some truly shocking and frightening stuff, or maybe fax you a few pages from some books. Believe me, it gets a lot nastier than anything I’ve described. Things are not that bad, certainly not yet. We’re not in 1918 anymore

So, stay calm, and do the simple things to keep yourself relatively safe. If everyone follows instructions like those, the virus wont have a chance to spread the way it could otherwise. That may sound like pat concluding advice, but there’s actually a lot to it – the epidemiology of infectious disease – in part the mathematics of infection – can be hugely altered depending on the behavior the typical individual. Following basic hygiene and getting your kids to too will make a big difference. There’s no denying that this is going to get worse before it gets better, but we can each do our part to minimize its opportunities.


A few comments on pandemic influenza

04:31 April 26th, 2009 by terry. Posted under me. | 94 Comments »

Here are some thoughts on the current swine influenza outbreak. These are just off the top of my head – I will undoubtedly think of more to say and add it in the comments or another posting. I apologize for the lack of links. I may come back and put some in.

I am both unqualified and qualified to make a few comments. I’m unqualified because I no longer work on influenza virus, because I’m not a virologist, because I have no inside information at all about the current outbreak. OTOH, I have some claim to know what I’m talking about. I worked on influenza virus as part of the Antigenic Cartography team at the University of Cambridge for a few years. We helped the WHO choose the H3N2 strain for the human vaccine. I’ve met the heads of the 4 international flu centers and even been in the WHO Situation Room in Geneva – a self-contained underground fortress. I spent a lot of time hanging out and talking to influenza virologists, many from the Erasmus Medical Centre in Rotterdam. I was even an author on a Science paper on the global spread of epidemic influenza. Plus I’ve read all the books on the 1918 pandemic, which gives some (largely retrospective) insight into what happened back then, and perhaps some insight into what could be about to happen.

I also feel it’s good for someone like me to comment because I’m outside the flu world and the people inside it will be unlikely to say much. Flu is a highly political issue, to put it mildly. People working in the flu research community will be reluctant to speak up. So I should make it very clear that the comments below are just my opinions, and don’t represent anyone else’s thoughts.

I’ll try to just make a few points that I think are fairly sober – neither alarmist, nor dismissive – and to keep speculation out of it.

Apart from the details of the actual virus, the social side of a potential pandemic is extraordinarily interesting. Very few people will have really concrete information, and those that do will still only be making their best guesses.

In a pandemic, or something that looks like it might be one, wild rumors sweep through the population. That will happen on an unprecedented scale this time round.

The virus has, as far as we know, not spent much time in humans yet. Once it does, it will begin to adapt itself in unpredictable ways. It may become more virulent, or less virulent. It may develop resistance to the antivirals that are currently effective. Antiviral resistance has been a topic of great concern for at least a couple of years. The current virus is already known to be resistant to both amantadine and rimantadine, though oseltamivir is still effective.

If you ask virologists what the probability is that there will be another pandemic, they’ll tell you it’s 1.0. It’s just a matter of time until it happens. it’s like a non-zero probability state in a Markov process. When it does happen, what you do in the first phase is critically important. In the case of the avian influenza they would try to immediately cull all potentially infected birds, to stop the virus spreading and mutating and becoming more likely to enter the human population. When it did get into the human population, there would be swift action to isolate it, again to reduce the spread and the time the virus has to adapt. In the case of the avian influenzas in humans, there has been very little airborne transmission, and we’re lucky for that. But the current virus seems to already have that property, which is of great concern.

It would be a miracle if the current epidemic vaccine provided any protection against this virus. The human vaccine does contain a strain against H1N1, but that’s a strain picked based on sampled human viruses from many months ago. The epidemic vaccine is aimed at thwarting what’s known as antigenic drift – the relatively slow accumulation of point mutations in the virus. Pandemic strains arise through antigenic shift in which large chunks of viral genetic material, sometimes whole genes, are mixed between influenza viruses from different species. In a pandemic strain some of the genetic material and the proteins it expresses will very likely never have been seen by a human immune system.

The current WHO standard influenza test kit is not very useful in identifying this strain. They have issued instructions warning against false negatives.

Some aspects of the current outbreak are, to my mind, cause for great concern.

The acting-director of the CDC has already said: “There are things that we see that suggest that containment is not very likely.” That is a remarkably candid statement. I think it’s very clear that the cat is out of the bag. The question is how bad is it going to be. That’s impossible to tell right now, because we do not know what the virus will look like in the future, after it has had time to mutate and adapt inside humans.

In normal circumstances it takes about 6 months to make the world’s supply of epidemic vaccine. It’s a long and difficult process requiring tons of virus to be grown in chicken eggs. A canidate vaccine strain has to be identified, it has to be one that grows well in the chicken egg (including not killing the chick). Even under the high pressure of a potential pandemic, making a new vaccine is going to take months. By then the virus may have moved on (via mutation) and the vaccine’s efficacy may be less. Note that the 1918 virus killed tens of millions of people over a period much shorter than this.

Diverting the world’s influenza resources to covering a pandemic threat necessarily diverts them from work on epidemic vaccines. Epidemic flu kills roughly 0.5M people a year as it is. Not being able to pay due attention to the epidemic strains is also a bad thing.

The new virus has been popping up in various places in the US in the last days. I expect it will go global in the next couple of days, maximum. What’s to stop it? The virus has been isolated in several diverse areas and in many cases is genetically identical. The 1918 virus also popped up, in many cases inexplicably, across the US. The book America’s Forgotten Pandemic is worth a read.

There were 3 waves of the 1918/19 pandemic. The first was in summer of 1918 – very unusual, as influenza normally falls to extremely low rates during summer. Note that the current outbreak is also highly unseasonal.

The 1918 pandemic killed with a very unusual age pattern. Instead of peaks in just the very young and the very old, there was a W shape, with a huge number of young and healthy people who would not normally die from influenza. There are various conjectures as to the cause of this. The current virus is also killing young and healthy adults.

The social breakdown in a pandemic is extraordinary. If you read The Great Pandemic by John Barry, you’ll get some sense of it. America’s Forgotten Pandemic also helps give some idea of what it must have been like.

No-one knows just how many people died as a result of the 1918 pandemic. Estimates generally range between 40M and 100M, and have trended upwards over the years. Influenza is not the easiest to diagnose (hence the category ILI – influenza-like illnesses). It also strips the throat of protective epithelial cells, leaving you susceptible to opportunistic follow-on infections, such as pneumonia, which often do the killing.

No-one knows how bad another pandemic might be in terms of mortality. Low estimates are in the single digit millions. Someone from the WHO suggested a significantly higher number about 4 years ago in the context of avian influenza and that number was quickly retracted. Jeff Taubenberger, who was responsible for resurrecting and sequencing the 1918 virus (an extraordinary story, related in a couple of books) has published work saying 100M might be possible. No-one knows, and it depends on many factors, including the characteristics of the virus, how early it is detected, how easily it spreads, how virulent it is (obviously), the social measures taken to combat it, antiviral resistance, and many other factors.

I don’t think anyone knows how the balance between vastly increased medical knowledge and vastly increased national and international travel will play out. If this virus is not popping up all over the world within a week’s time, I’ll be surprised. Airports are already screening people arriving from Mexico, but I imagine it’s too late and it’s certainly not being done globally.

History dictates that you should probably not believe anything any politician says about pandemic influenza. There has been a strong tendency to downplay risks. All sorts of factors are at work in communicating with the public. You can be sure that everything officially said by the WHO or CDC has been very carefully vetted and considered. There’s no particular reason to believe anything else you hear, either :-)

Facemasks have an interesting history, and have made it into law several times. In 1918 we didn’t even know what a virus was, let alone how tiny they are, so the gauze on the masks was likely totally ineffective.

In conclusion, I’d say that the thing is largely out of our hands for the time being. We’re going to have to wait and see what happens, and make our best guesses along the way.

The influenza people at the CDC and the other international labs are an amazing team of experts. They’ve been at this game for a very long time and they work extremely hard and generally get a bad rap. It’s no wonder flu is such a political issue, the responsibility is high and the tendency towards opaqueness is understandable. Despite all the expertise though, at bottom you have an extremely complex virus – much of whose behavior is unknown, especially in the case of antigenic shift, especially when it is so young, and especially when you don’t know what nearby mutational opportunities may exist for it in antigenic space – spreading in a vastly more complex environment (our bodies), and with us moving and interacting in odd ways in a complex and extremely interconnected world. It’s a wonder we know as much as we do, but in many ways we don’t know much at all.