Posts Tagged ‘python’

Python code for retrieving all your tweets

Wednesday, June 24th, 2009

Here’s a little Python code to pull back all a user’s Twitter tweets. Make sure you read the notes at bottom in case you want to use it.

import sys, twitter, operator
from dateutil.parser import parse

twitterURL = ‘http://twitter.com’

def fetch(user):
    data = {}
    api = twitter.Api()
    max_id = None
    total = 0
    while True:
        statuses = api.GetUserTimeline(user, count=200, max_id=max_id)
        newCount = ignCount = 0
        for s in statuses:
            if s.id in data:
                ignCount += 1
            else:
                data[s.id] = s
                newCount += 1
        total += newCount
        print >>sys.stderr, "Fetched %d/%d/%d new/old/total." % (
            newCount, ignCount, total)
        if newCount == 0:
            break
        max_id = min([s.id for s in statuses])1
    return data.values()

def htmlPrint(user, tweets):
    for t in tweets:
        t.pdate = parse(t.created_at)
    key = operator.attrgetter(‘pdate’)
    tweets = sorted(tweets, key=key)
    f = open(‘%s.html’ % user, ‘wb’)
    print >>f, """<html><title>Tweets for %s</title>
    <meta http-equiv="
Content-Type" content="text/html;charset=utf-8">
    <body><small>"
"" % user
    for i, t in enumerate(tweets):
        print >>f, ‘%d. %s <a href="%s/%s/status/%d">%s</a><br/>’ % (
            i, t.pdate.strftime(‘%Y-%m-%d %H:%M’), twitterURL,
            user, t.id, t.text.encode(‘utf8′))
    print >>f, ‘</small></body></html>’
    f.close()
   
if __name__ == ‘__main__’:
    user = ‘terrycojones’ if len(sys.argv) < 2 else sys.argv[1]
    data = fetch(user)
    htmlPrint(user, data)
 

Notes:

Fetch all of a user’s tweets and write them to a file username.html (where username is given on the command line).

Output is to a file instead of to stdout as tweet texts are unicode and sys.stdout.encoding is ascii on my machine, which prevents printing non-ASCII chars.

This code uses the Python-Twitter library. You need to get (via SVN) the very latest version, and then you need to fix a tiny bug, described here. Or wait a while and the SVN trunk will be patched.

This worked flawlessly for my 2,300 tweets, but only retrieved about half the tweets of someone who had over 7,000. I’m not sure what happened there.

There are tons of things that could be done to make the output more attractive and useful. And yes, for nitpickers, the code has a couple of slight inefficiencies :-)

A mixin class allowing Python __init__ methods to work with Twisted deferreds

Monday, May 11th, 2009

I posted to the Python Twisted list back in Nov 2008 with subject: A Python metaclass for Twisted allowing __init__ to return a Deferred

Briefly, I was trying to find a nice way to allow the __init__ method of a class to work with deferreds in such a way that methods of the class could use work done by __init__ safe in the knowledge that the deferreds had completed. E.g., if you have

class X(object):
    def __init__(self, host, port):
        def final(connection):
            self.db = connection
        d = makeDBConnection(host, port)
        d.addCallback(final)

    def query(self, q):
        return self.db.runQuery(q)
 

Then when you make an X and call query on it, there’s a chance the deferred wont have fired, and you’ll get an error. This is just a very simple illustrative example. There are many more, and this is a general problem of the synchronous world (in which __init__ is supposed to prepare a fully-fledged class instance and cannot return a deferred) meeting the asynchronous world in which, as Twisted programmers, we would like to (and must) use deferreds.

The earlier thread, with lots of useful followups can be read here. Although I learned a lot in that thread, I wasn’t completely happy with any of the solutions. Some of the things that still bugged me are in posts towards the end of the thread (here and here).

The various approaches we took back then all boiled down to waiting for a deferred to fire before the class instance was fully ready to use. When that happened, you had your instance and could call its methods.

I had also thought about an alternate approach: having __init__ add a callback to the deferreds it dealt with to set a flag in self and then have all dependent methods check that flag to see if the class instance was ready for use. But that 1) is ugly (too much extra code); 2) means the caller has to be prepared to deal with errors due to the class instance not being ready, and 3) adds a check to every method call. It would look something like this:

class X(object):
    def __init__(self, host, port):
        self.ready = False
        def final(connection):
            self.db = connection
            self.ready = True
        d = makeDBConnection(host, port)
        d.addCallback(final)

    def query(self, q):
        if not self.ready:
            raise IAmNotReadyException()
        return self.db.runQuery(q)
 

That was too ugly for my taste, for all of the above reasons, most especially for forcing the unfortunate caller of my code to handle IAmNotReadyException.

Anyway…. fast forward 6 months and I’ve hit the same problem again. It’s with existing code, in which I would like an __init__ to call something that (now, due to changes elsewhere) returns a deferred. So I started thinking again, and came up with a much cleaner way to do the alternate approach via a class mixin:

from twisted.internet import defer

class deferredInitMixin(object):
    def wrap(self, d, *wrappedMethods):
        self.waiting = []
        self.stored = {}

        def restore(_):
            for method in self.stored:
                setattr(self, method, self.stored[method])
            for d in self.waiting:
                d.callback(None)

        def makeWrapper(method):
            def wrapper(*args, **kw):
                d = defer.Deferred()
                d.addCallback(lambda _: self.stored[method](*args, **kw))
                self.waiting.append(d)
                return d
            return wrapper

        for method in wrappedMethods:
            self.stored[method] = getattr(self, method)
            setattr(self, method, makeWrapper(method))

        d.addCallback(restore)
 

You use it as in the class Test below:

from twisted.internet import defer, reactor

def fire(d, value):
    print "I finally fired, with value", value
    d.callback(value)

def late(value):
    d = defer.Deferred()
    reactor.callLater(1, fire, d, value)
    return d

def called(result, what):
    print ‘final callback of %s, result = %s’ % (what, result)

def stop(_):
    reactor.stop()

class Test(deferredInitMixin):
    def __init__(self):
        d = late(‘Test’)
        deferredInitMixin.wrap(self, d, ‘f1′, ‘f2′)

    def f1(self, arg):
        print "f1 called with", arg
        return late(arg)

    def f2(self, arg):
        print "f2 called with", arg
        return late(arg)

if __name__ == ‘__main__’:
    t = Test()
    d1 = t.f1(44)
    d1.addCallback(called, ‘f1′)
    d2 = t.f1(33)
    d2.addCallback(called, ‘f1′)
    d3 = t.f2(11)
    d3.addCallback(called, ‘f2′)
    d = defer.DeferredList([d1, d2, d3])
    d.addBoth(stop)
    reactor.run()
 

Effectively, the __init__ of my Test class asks deferredInitMixin to temporarily wrap some of its methods. deferredInitMixin stores the original methods away and replaces each of them with a function that immediately returns a deferred. So after __init__ finishes, code that calls the now-wrapped methods of the class instance before the deferred has fired will get a deferred back as usual (but see * below). As far as they know, everything is normal. Behind the scenes, deferredInitMixin has arranged for these deferreds to fire only after the deferred passed from __init__ has fired. Once that happens, deferredInitMixin also restores the original functions to the instance. As a result there is no overhead later to check a flag to see if the instance is ready to use. If the deferred from __init__ happens to fire before any of the instance’s methods are called, it will simply restore the original methods. Finally (obviously?) you only pass the method names to deferredInitMixin that depend on the deferred in __init__ being done.

BTW, calling the methods passed to deferredInitMixin “wrapped” isn’t really accurate. They’re just temporarily replaced.

I quite like this approach. It’s a second example of something I posted about here, in which a pool of deferreds is accumulated and all fired when another deferred fires. It’s nice because you don’t reply with an error and there’s no need for locking or other form of coordination – the work you need done is already in progress, so you get back a fresh deferred and everything goes swimmingly.

* Minor note: the methods you wrap should probably be ones that already return deferreds. That way you always get a deferred back from them, whether they’re temporarily wrapped or not. The above mixin works just fine if you ask it to wrap non-deferred-returning functions, but you have to deal with the possibility that they will return a deferred (i.e., if you call them while they’re wrapped).

A Python metaclass for Twisted allowing __init__ to return a Deferred

Monday, November 3rd, 2008

OK, I admit, this is geeky.

But we’ve all run into the situation in which you’re using Python and Twisted, and you’re writing a new class and you want to call something from the __init method that returns a Deferred. This is a problem. The __init method is not allowed to return a value, let alone a Deferred. While you could just call the Deferred-returning function from inside your __init, there’s no guarantee of when that Deferred will fire. Seeing as you’re in your __init method, it’s a good bet that you need that function to have done its thing before you let anyone get their hands on an instance of your class.

For example, consider a class that provides access to a database table. You want the __init__ method to create the table in the db if it doesn’t already exist. But if you’re using Twisted’s twisted.enterprise.adbapi class, the runInteraction method returns a Deferred. You can call it to create the tables, but you don’t want the instance of your class back in the hands of whoever’s creating it until the table is created. Otherwise they might call a method on the instance that expects the table to be there.

A cumbersome solution would be to add a callback to the Deferred you get back from runInteraction and have that callback add an attribute to self to indicate that it is safe to proceed. Then all your class methods that access the db table would have to check to see if the attribute was on self, and take some alternate action if not. That’s going to get ugly very fast plus, your caller has to deal with you potentially not being ready.

I ran into this problem a couple of days ago and after scratching my head for a while I came up with an idea for how to solve this pretty cleanly via a Python metaclass. Here’s the metaclass code:

from twisted.internet import defer

class TxDeferredInitMeta(type):
    def __new__(mcl, classname, bases, classdict):
        hidden = ‘__hidden__’
        instantiate = ‘__instantiate__’
        for name in hidden, instantiate:
            if name in classdict:
                raise Exception(
                    ‘Class %s contains an illegally-named %s method’ %
                    (classname, name))
        try:
            origInit = classdict[‘__init__’]
        except KeyError:
            origInit = lambda self: None
        def newInit(self, *args, **kw):
            hiddenDict = dict(args=args, kw=kw, __init__=origInit)
            setattr(self, hidden, hiddenDict)
        def _instantiate(self):
            def addSelf(result):
                return (self, result)
            hiddenDict = getattr(self, hidden)
            d = defer.maybeDeferred(hiddenDict[‘__init__’], self,
                                    *hiddenDict[‘args’], **hiddenDict[‘kw’])
            return d.addCallback(addSelf)
        classdict[‘__init__’] = newInit
        classdict[instantiate] = _instantiate
        return super(TxDeferredInitMeta, mcl).__new__(
            mcl, classname, bases, classdict)
 

I’m not going to explain what it does here. If it’s not clear and you want to know, send me mail or post a comment. But I’ll show you how you use it in practice. It’s kind of weird, but it makes sense once you get used to it.

First, we make a class whose metaclass is TxDeferredInitMeta and whose __init__ method returns a deferred:

class MyClass(object):
    __metaclass__ = TxDeferredInitMeta
    def __init__(self):
        d = aFuncReturningADeferred()
        return d
 

Having __init__ return anything other than None is illegal in normal Python classes. But this is not a normal Python class, as you will now see.

Given our class, we use it like this:

def cb((instance, result)):
    # instance is an instance of MyClass
    # result is from the callback chain of aFuncReturningADeferred
    pass

d = MyClass()
d.__instantiate__()
d.addCallback(cb)
 

That may look pretty funky, but if you’re used to Twisted it wont seem too bizarre. What’s happening is that when you ask to make an instance of MyClass, you get back an instance of a regular Python class. It has a method called __instantiate__ that returns a Deferred. You add a callback to that Deferred and that callback is eventually passed two things. The first is an instance of MyClass, as you requested. The second is the result that came down the callback chain from the Deferred that was returned by the __init__ method you wrote in MyClass.

The net result is that you have the value of the Deferred and you have your instance of MyClass. It’s safe to go ahead and use the instance because you know the Deferred has been called. It will probably seem a bit odd to get your instance later as a result of a Deferred firing, but that’s perfectly in keeping with the Twisted way.

That’s it for now. You can grab the code and a trial test suite to put it through its paces at http://foss.fluidinfo.com/txDeferredInitMeta.zip. The code could be cleaned up somewhat, and made more general. There is a caveat to using it – your class can’t have __hidden__ or __instantiate__ methods. That could be improved. But I’m not going to bother for now, unless someone cares.