Add to Technorati Favorites

Python code for retrieving all your tweets

Here’s a little Python code to pull back all a user’s Twitter tweets. Make sure you read the notes at bottom in case you want to use it.

import sys, twitter, operator
from dateutil.parser import parse

twitterURL = 'http://twitter.com'

def fetch(user):
    data = {}
    api = twitter.Api()
    max_id = None
    total = 0
    while True:
        statuses = api.GetUserTimeline(user, count=200, max_id=max_id)
        newCount = ignCount = 0
        for s in statuses:
            if s.id in data:
                ignCount += 1
            else:
                data[s.id] = s
                newCount += 1
        total += newCount
        print >>sys.stderr, "Fetched %d/%d/%d new/old/total." % (
            newCount, ignCount, total)
        if newCount == 0:
            break
        max_id = min([s.id for s in statuses]) - 1
    return data.values()

def htmlPrint(user, tweets):
    for t in tweets:
        t.pdate = parse(t.created_at)
    key = operator.attrgetter('pdate')
    tweets = sorted(tweets, key=key)
    f = open('%s.html' % user, 'wb')
    print >>f, """Tweets for %s
    
    """ % user
    for i, t in enumerate(tweets):
        print >>f, '%d. %s %s
' % ( i, t.pdate.strftime('%Y-%m-%d %H:%M'), twitterURL, user, t.id, t.text.encode('utf8')) print >>f, '
' f.close() if __name__ == '__main__': user = 'terrycojones' if len(sys.argv) < 2 else sys.argv[1] data = fetch(user) htmlPrint(user, data)

Notes:

Fetch all of a user's tweets and write them to a file username.html (where username is given on the command line).

Output is to a file instead of to stdout as tweet texts are unicode and sys.stdout.encoding is ascii on my machine, which prevents printing non-ASCII chars.

This code uses the Python-Twitter library. You need to get (via SVN) the very latest version, and then you need to fix a tiny bug, described here. Or wait a while and the SVN trunk will be patched.

This worked flawlessly for my 2,300 tweets, but only retrieved about half the tweets of someone who had over 7,000. I'm not sure what happened there.

There are tons of things that could be done to make the output more attractive and useful. And yes, for nitpickers, the code has a couple of slight inefficiencies :-)


You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

13 Responses to “Python code for retrieving all your tweets”

  1. I think the amout of tweets that can be retrieve with the api is ilimited. I could check in the api doc but I’m lazy. However none of tweet are lost.

  2. I think the amout of tweets that can be retrieve with the api is ilimited. I could check in the api doc but I’m lazy. However none of tweet are lost.

  3. Cheers! I cut-n-pasted this into my terminal and then had to clean up “educated” quotes and at least one en-dash that was masquerading as a minus. Not sure if it’s safari, your HTML, EBCAK, but FYI.

  4. Cheers! I cut-n-pasted this into my terminal and then had to clean up “educated” quotes and at least one en-dash that was masquerading as a minus. Not sure if it’s safari, your HTML, EBCAK, but FYI.

  5. genius

  6. genius

  7. > This worked flawlessly for my 2,300 tweets, but only retrieved
    > about half the tweets of someone who had over 7,000. I?m not
    > sure what happened there.

    See the API docs: http://apiwiki.twitter.com/Things-Every-Developer-Should-Know#6Therearepaginationlimits

    “Clients may request up to 3,200 statuses via the page and count parameters for timeline REST API methods. Requests for more than the limit will result in a reply with a status code of 200 and an empty result in the format requested. Twitter still maintains a database of all the tweets sent by a user. However, to ensure performance of the site, this artificial limit is temporarily in place.”

  8. > This worked flawlessly for my 2,300 tweets, but only retrieved
    > about half the tweets of someone who had over 7,000. I?m not
    > sure what happened there.

    See the API docs: http://apiwiki.twitter.com/Things-Every-Developer-Should-Know#6Therearepaginationlimits

    “Clients may request up to 3,200 statuses via the page and count parameters for timeline REST API methods. Requests for more than the limit will result in a reply with a status code of 200 and an empty result in the format requested. Twitter still maintains a database of all the tweets sent by a user. However, to ensure performance of the site, this artificial limit is temporarily in place.”

  9. this is nice information need to know more

    Thanks
    sam hardsy
    ______________________________________________

  10. How to retrieve all tweet in favorites timeline?

  11. How to retrieve all tweet in favorites timeline?

  12. Keerthantantry Says:

    I am the beginer to python. Can you please tell me the way to run the code?i have installed the python 2.7 and downloaded the python twitter library too

  13. nice, but didn’t work for me, not to mention those ‘smart quotes’.

    i prefer twitter-log username… it’s also python and outputs to plain text… (pip install twitter)