Add to Technorati Favorites

Fault-tolerant Python Twisted classes for getting all Twitter friends or followers

It’s been forever since I blogged here. I just wrote a little Python to grab all of a user’s friends or followers (or just their user ids). It uses Twisted, of course. There were two main reasons for doing this: 1) I want all friends/followers, not just the first bunch returned by the Twitter API, and 2) I wanted code that is fairly robust in the face of various 50x HTTP errors (I regularly experience INTERNAL_SERVER_ERROR, BAD_GATEWAY, and SERVICE_UNAVAILABLE).

If you want to use the code below and you’re not familiar with the Twitter API, consider whether you can use the FriendsIdFetcher and FollowersIdFetcher classes as they’ll do far fewer requests (you get 5000 results per API call, instead of 100). If you can live with user ids and do the occasional fetch of a full user, you’ll probably do far fewer API calls.

For the FriendsFetcher and FollowersFetcher classes, you get back a list of dictionaries, one per user. For FriendsIdFetcher and FollowersIdFetcher you get a list of Twitter user ids.

Of course there’s no documentation. Feel free to ask questions in the comments. Download the source.

import sys

from twisted.internet import defer
from twisted.web import client, error, http
    
if sys.hexversion >= 0x20600f0:
    import json
else:
    import simplejson as json

class _Fetcher(object):
    baseURL = 'http://twitter.com/'
    URITemplate = None # Override in subclass.
    dataKey = None # Override in subclass.
    maxErrs = 10
    okErrs = (http.INTERNAL_SERVER_ERROR,
              http.BAD_GATEWAY,
              http.SERVICE_UNAVAILABLE)
    
    def __init__(self, name):
        assert self.baseURL.endswith('/')
        self.results = []
        self.errCount = 0
        self.nextCursor = -1
        self.deferred = defer.Deferred()
        self.URL = self.baseURL + (self.URITemplate % { 'name' : name })

    def _fail(self, failure):
        failure.trap(error.Error)
        self.errCount += 1
        if (self.errCount < self.maxErrs and
            int(failure.value.status) in self.okErrs):
            self.fetch()
        else:
            self.deferred.errback(failure)
        
    def _parse(self, result):
        try:
            data = json.loads(result)
            self.nextCursor = data.get('next_cursor')
            self.results.extend(data[self.dataKey])
        except Exception:
            self.deferred.errback()
        else:
            self.fetch()
            
    def _deDup(self):
        raise NotImplementedError('Override _deDup in subclasses.')

    def fetch(self):
        if self.nextCursor:
            d = client.getPage(self.URL + '?cursor=%s' % self.nextCursor)
            d.addCallback(self._parse)
            d.addErrback(self._fail)
        else:
            self.deferred.callback(self._deDup())
        return self.deferred

class _FriendsOrFollowersFetcher(_Fetcher):
    dataKey = u'users'
    
    def _deDup(self):
        seen = set()
        result = []
        for userdict in self.results:
            uid = userdict['id']
            if uid not in seen:
                result.append(userdict)
                seen.add(uid)
        return result

class _IdFetcher(_Fetcher):
    dataKey = u'ids'
    
    def _deDup(self):
        # Keep the ids in the order we received them.
        seen = set()
        result = []
        for uid in self.results:
            if uid not in seen:
                result.append(uid)
                seen.add(uid)
        return result

class FriendsFetcher(_FriendsOrFollowersFetcher):
    URITemplate = 'statuses/friends/%(name)s.json'

class FollowersFetcher(_FriendsOrFollowersFetcher):
    URITemplate = 'statuses/followers/%(name)s.json'

class FriendsIdFetcher(_IdFetcher):
    URITemplate = 'friends/ids/%(name)s.json'

class FollowersIdFetcher(_IdFetcher):
    URITemplate = 'followers/ids/%(name)s.json'

Usage is dead simple:

fetcher = FriendsFetcher('terrycojones')
d = fetcher.fetch()
d.addCallback(....) # etc.

Enjoy.


You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.