Python code for retrieving all your tweets
Here’s a little Python code to pull back all a user’s Twitter tweets. Make sure you read the notes at bottom in case you want to use it.
from dateutil.parser import parse
twitterURL = ‘http://twitter.com’
def fetch(user):
data = {}
api = twitter.Api()
max_id = None
total = 0
while True:
statuses = api.GetUserTimeline(user, count=200, max_id=max_id)
newCount = ignCount = 0
for s in statuses:
if s.id in data:
ignCount += 1
else:
data[s.id] = s
newCount += 1
total += newCount
print >>sys.stderr, "Fetched %d/%d/%d new/old/total." % (
newCount, ignCount, total)
if newCount == 0:
break
max_id = min([s.id for s in statuses]) – 1
return data.values()
def htmlPrint(user, tweets):
for t in tweets:
t.pdate = parse(t.created_at)
key = operator.attrgetter(‘pdate’)
tweets = sorted(tweets, key=key)
f = open(‘%s.html’ % user, ‘wb’)
print >>f, """<html><title>Tweets for %s</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<body><small>""" % user
for i, t in enumerate(tweets):
print >>f, ‘%d. %s <a href="%s/%s/status/%d">%s</a><br/>’ % (
i, t.pdate.strftime(‘%Y-%m-%d %H:%M’), twitterURL,
user, t.id, t.text.encode(‘utf8′))
print >>f, ‘</small></body></html>’
f.close()
if __name__ == ‘__main__’:
user = ‘terrycojones’ if len(sys.argv) < 2 else sys.argv[1]
data = fetch(user)
htmlPrint(user, data)
Notes:
Fetch all of a user’s tweets and write them to a file username.html (where username is given on the command line).
Output is to a file instead of to stdout as tweet texts are unicode and sys.stdout.encoding is ascii on my machine, which prevents printing non-ASCII chars.
This code uses the Python-Twitter library. You need to get (via SVN) the very latest version, and then you need to fix a tiny bug, described here. Or wait a while and the SVN trunk will be patched.
This worked flawlessly for my 2,300 tweets, but only retrieved about half the tweets of someone who had over 7,000. I’m not sure what happened there.
There are tons of things that could be done to make the output more attractive and useful. And yes, for nitpickers, the code has a couple of slight inefficiencies :-)