Archive for October, 2009

Fault-tolerant Python Twisted classes for getting all Twitter friends or followers

Thursday, October 22nd, 2009

It’s been forever since I blogged here. I just wrote a little Python to grab all of a user’s friends or followers (or just their user ids). It uses Twisted, of course. There were two main reasons for doing this: 1) I want all friends/followers, not just the first bunch returned by the Twitter API, and 2) I wanted code that is fairly robust in the face of various 50x HTTP errors (I regularly experience INTERNAL_SERVER_ERROR, BAD_GATEWAY, and SERVICE_UNAVAILABLE).

If you want to use the code below and you’re not familiar with the Twitter API, consider whether you can use the FriendsIdFetcher and FollowersIdFetcher classes as they’ll do far fewer requests (you get 5000 results per API call, instead of 100). If you can live with user ids and do the occasional fetch of a full user, you’ll probably do far fewer API calls.

For the FriendsFetcher and FollowersFetcher classes, you get back a list of dictionaries, one per user. For FriendsIdFetcher and FollowersIdFetcher you get a list of Twitter user ids.

Of course there’s no documentation. Feel free to ask questions in the comments. Download the source.

import sys

from twisted.internet import defer
from twisted.web import client, error, http
   
if sys.hexversion >= 0x20600f0:
    import json
else:
    import simplejson as json

class _Fetcher(object):
    baseURL = ‘http://twitter.com/’
    URITemplate = None # Override in subclass.
    dataKey = None # Override in subclass.
    maxErrs = 10
    okErrs = (http.INTERNAL_SERVER_ERROR,
              http.BAD_GATEWAY,
              http.SERVICE_UNAVAILABLE)
   
    def __init__(self, name):
        assert self.baseURL.endswith(‘/’)
        self.results = []
        self.errCount = 0
        self.nextCursor = -1
        self.deferred = defer.Deferred()
        self.URL = self.baseURL + (self.URITemplate % { ‘name’ : name })

    def _fail(self, failure):
        failure.trap(error.Error)
        self.errCount += 1
        if (self.errCount < self.maxErrs and
            int(failure.value.status) in self.okErrs):
            self.fetch()
        else:
            self.deferred.errback(failure)
       
    def _parse(self, result):
        try:
            data = json.loads(result)
            self.nextCursor = data.get(‘next_cursor’)
            self.results.extend(data[self.dataKey])
        except Exception:
            self.deferred.errback()
        else:
            self.fetch()
           
    def _deDup(self):
        raise NotImplementedError(‘Override _deDup in subclasses.’)

    def fetch(self):
        if self.nextCursor:
            d = client.getPage(self.URL + ‘?cursor=%s’ % self.nextCursor)
            d.addCallback(self._parse)
            d.addErrback(self._fail)
        else:
            self.deferred.callback(self._deDup())
        return self.deferred

class _FriendsOrFollowersFetcher(_Fetcher):
    dataKey = u‘users’
   
    def _deDup(self):
        seen = set()
        result = []
        for userdict in self.results:
            uid = userdict[‘id’]
            if uid not in seen:
                result.append(userdict)
                seen.add(uid)
        return result

class _IdFetcher(_Fetcher):
    dataKey = u‘ids’
   
    def _deDup(self):
        # Keep the ids in the order we received them.
        seen = set()
        result = []
        for uid in self.results:
            if uid not in seen:
                result.append(uid)
                seen.add(uid)
        return result

class FriendsFetcher(_FriendsOrFollowersFetcher):
    URITemplate = ‘statuses/friends/%(name)s.json’

class FollowersFetcher(_FriendsOrFollowersFetcher):
    URITemplate = ‘statuses/followers/%(name)s.json’

class FriendsIdFetcher(_IdFetcher):
    URITemplate = ‘friends/ids/%(name)s.json’

class FollowersIdFetcher(_IdFetcher):
    URITemplate = ‘followers/ids/%(name)s.json’
 

Usage is dead simple:

fetcher = FriendsFetcher(‘terrycojones’)
d = fetcher.fetch()
d.addCallback(….) # etc.
 

Enjoy.

Crowdsourcing Arabic-to-English translation in the Geneva airport

Saturday, October 10th, 2009

Today I met an extraordinary Iranian man in the Geneva airport. He’s written a 1000 page book in Arabic about (at least in part) his experiences in Cyprus. He approached me, asked if my English was really really good, sat next to me, and started pulling out several pages of hand-wrtten uppercase English. He had me go over them, improve them, write some new text as he read his Arabic in halting English, told me exactly how he wanted it to sound, pressed me to find shorter ways to say things, and finally got me to write out (for his next helper, no doubt) a clean copy of all my work. He had me go look up a recent paper dating the evolutionary split between humans & chimpanzees and to confirm that it didn’t contradict his text (another fragment thrust importunately into my hands). He was about 75. We spent 90 mins together, smiling and congratulating each other over a few sentences that turned out particularly well. Told me he’s going to have it published by Oxford – that’s his aim anyway.

I thought to myself that we each have our own mountain to climb – or at least those who have a taste for years-long patient endeavors, but how different his from mine. We parted and he went off to approach another stranger. He’ll get the whole book done a few pages a day in the Geneva airport, I’ve no doubt. “It’s the perfect place” he told me. Amazing, extraordinary, humbling, etc…