Archive for the ‘twitter’ Category

Fault-tolerant Python Twisted classes for getting all Twitter friends or followers

Thursday, October 22nd, 2009

It’s been forever since I blogged here. I just wrote a little Python to grab all of a user’s friends or followers (or just their user ids). It uses Twisted, of course. There were two main reasons for doing this: 1) I want all friends/followers, not just the first bunch returned by the Twitter API, and 2) I wanted code that is fairly robust in the face of various 50x HTTP errors (I regularly experience INTERNAL_SERVER_ERROR, BAD_GATEWAY, and SERVICE_UNAVAILABLE).

If you want to use the code below and you’re not familiar with the Twitter API, consider whether you can use the FriendsIdFetcher and FollowersIdFetcher classes as they’ll do far fewer requests (you get 5000 results per API call, instead of 100). If you can live with user ids and do the occasional fetch of a full user, you’ll probably do far fewer API calls.

For the FriendsFetcher and FollowersFetcher classes, you get back a list of dictionaries, one per user. For FriendsIdFetcher and FollowersIdFetcher you get a list of Twitter user ids.

Of course there’s no documentation. Feel free to ask questions in the comments. Download the source.

import sys

from twisted.internet import defer
from twisted.web import client, error, http
   
if sys.hexversion >= 0x20600f0:
    import json
else:
    import simplejson as json

class _Fetcher(object):
    baseURL = ‘http://twitter.com/’
    URITemplate = None # Override in subclass.
    dataKey = None # Override in subclass.
    maxErrs = 10
    okErrs = (http.INTERNAL_SERVER_ERROR,
              http.BAD_GATEWAY,
              http.SERVICE_UNAVAILABLE)
   
    def __init__(self, name):
        assert self.baseURL.endswith(‘/’)
        self.results = []
        self.errCount = 0
        self.nextCursor = -1
        self.deferred = defer.Deferred()
        self.URL = self.baseURL + (self.URITemplate % { ‘name’ : name })

    def _fail(self, failure):
        failure.trap(error.Error)
        self.errCount += 1
        if (self.errCount < self.maxErrs and
            int(failure.value.status) in self.okErrs):
            self.fetch()
        else:
            self.deferred.errback(failure)
       
    def _parse(self, result):
        try:
            data = json.loads(result)
            self.nextCursor = data.get(‘next_cursor’)
            self.results.extend(data[self.dataKey])
        except Exception:
            self.deferred.errback()
        else:
            self.fetch()
           
    def _deDup(self):
        raise NotImplementedError(‘Override _deDup in subclasses.’)

    def fetch(self):
        if self.nextCursor:
            d = client.getPage(self.URL + ‘?cursor=%s’ % self.nextCursor)
            d.addCallback(self._parse)
            d.addErrback(self._fail)
        else:
            self.deferred.callback(self._deDup())
        return self.deferred

class _FriendsOrFollowersFetcher(_Fetcher):
    dataKey = u‘users’
   
    def _deDup(self):
        seen = set()
        result = []
        for userdict in self.results:
            uid = userdict[‘id’]
            if uid not in seen:
                result.append(userdict)
                seen.add(uid)
        return result

class _IdFetcher(_Fetcher):
    dataKey = u‘ids’
   
    def _deDup(self):
        # Keep the ids in the order we received them.
        seen = set()
        result = []
        for uid in self.results:
            if uid not in seen:
                result.append(uid)
                seen.add(uid)
        return result

class FriendsFetcher(_FriendsOrFollowersFetcher):
    URITemplate = ‘statuses/friends/%(name)s.json’

class FollowersFetcher(_FriendsOrFollowersFetcher):
    URITemplate = ‘statuses/followers/%(name)s.json’

class FriendsIdFetcher(_IdFetcher):
    URITemplate = ‘friends/ids/%(name)s.json’

class FollowersIdFetcher(_IdFetcher):
    URITemplate = ‘followers/ids/%(name)s.json’
 

Usage is dead simple:

fetcher = FriendsFetcher(‘terrycojones’)
d = fetcher.fetch()
d.addCallback(….) # etc.
 

Enjoy.

Python code for retrieving all your tweets

Wednesday, June 24th, 2009

Here’s a little Python code to pull back all a user’s Twitter tweets. Make sure you read the notes at bottom in case you want to use it.

import sys, twitter, operator
from dateutil.parser import parse

twitterURL = ‘http://twitter.com’

def fetch(user):
    data = {}
    api = twitter.Api()
    max_id = None
    total = 0
    while True:
        statuses = api.GetUserTimeline(user, count=200, max_id=max_id)
        newCount = ignCount = 0
        for s in statuses:
            if s.id in data:
                ignCount += 1
            else:
                data[s.id] = s
                newCount += 1
        total += newCount
        print >>sys.stderr, "Fetched %d/%d/%d new/old/total." % (
            newCount, ignCount, total)
        if newCount == 0:
            break
        max_id = min([s.id for s in statuses])1
    return data.values()

def htmlPrint(user, tweets):
    for t in tweets:
        t.pdate = parse(t.created_at)
    key = operator.attrgetter(‘pdate’)
    tweets = sorted(tweets, key=key)
    f = open(‘%s.html’ % user, ‘wb’)
    print >>f, """<html><title>Tweets for %s</title>
    <meta http-equiv="
Content-Type" content="text/html;charset=utf-8">
    <body><small>"
"" % user
    for i, t in enumerate(tweets):
        print >>f, ‘%d. %s <a href="%s/%s/status/%d">%s</a><br/>’ % (
            i, t.pdate.strftime(‘%Y-%m-%d %H:%M’), twitterURL,
            user, t.id, t.text.encode(‘utf8′))
    print >>f, ‘</small></body></html>’
    f.close()
   
if __name__ == ‘__main__’:
    user = ‘terrycojones’ if len(sys.argv) < 2 else sys.argv[1]
    data = fetch(user)
    htmlPrint(user, data)
 

Notes:

Fetch all of a user’s tweets and write them to a file username.html (where username is given on the command line).

Output is to a file instead of to stdout as tweet texts are unicode and sys.stdout.encoding is ascii on my machine, which prevents printing non-ASCII chars.

This code uses the Python-Twitter library. You need to get (via SVN) the very latest version, and then you need to fix a tiny bug, described here. Or wait a while and the SVN trunk will be patched.

This worked flawlessly for my 2,300 tweets, but only retrieved about half the tweets of someone who had over 7,000. I’m not sure what happened there.

There are tons of things that could be done to make the output more attractive and useful. And yes, for nitpickers, the code has a couple of slight inefficiencies :-)

FluidDB domain names available early (and free) for Twitter users

Saturday, January 24th, 2009

Sometime in the next few months, Fluidinfo will launch an alpha version of FluidDB, the database with the heart of a wiki. It’s a big engineering task, and there will still be a lot to do when we go into alpha, so we’ll initially only have a small number of applications being built on FluidDB.

But that doesn’t mean you can’t get into the action early.

Starting today, we’re pleased to offer FluidDB domains for free to Twitter users. This is perhaps the simplest way you’ll ever sign up for a new web service – if you’re a Twitter user:

Simply follow FluidDB on Twitter.

Yes, that’s it. You’re done.

Later, when we create your FluidDB domain, we’ll send you your FluidDB password via a direct message in Twitter. Note that we haven’t asked for your real name, your email, a password, sent you a cookie, or asked you to fill out a pesky sign-up form. The point here is simply to give you an early opportunity to trivially claim your preferred name.

Feel free to tweet the URL of this posting (http://bit.ly/bezc). You can follow me too for extra credit. If you’re not already a Twitter user and you want a free FluidDB domain name, sign up for Twitter, and then follow FluidDB.

Mini FAQ:

Why would I do this? By following FluidDB you will reserve your (Twitter) user name as your domain name in FluidDB.

Is there any charge? No.

What is a FluidDB domain? Sorry, but you’ll have to wait to find out the answer to this. We can tell you though that FluidDB domains will have many uses, and that they wont all be free.

What if I change my mind? Just unfollow FluidDB on Twitter.

Why Twitter? Because we like Twitter. We may do a similar thing for other services, allowing users to later claim their domain via OpenID, but that introduces the potential of naming conflicts.

Finally, please note that we can’t give an iron-clad guarantee that you’ll get your Twitter user name as your FluidDB domain name, but we’ll do our best. At this early stage of the game, we reserve the right to do whatever we want :-)

Who signed up for Twitter immediately before/after you?

Wednesday, January 14th, 2009

This is just a quick hack, done in about 20 minutes in 32 lines of Python. The following script will print out the Twitter screen names of the people who signed up immediately before and after a given user.

import sys
from twitter import Api
from operator import add
from functools import partial

inc = partial(add, 1)
dec = partial(add, -1)
api = Api()

def getUser(u):
    try:
        return api.GetUser(u)
    except Exception:
        return None

def do(name):
    user = getUser(name)
    if user:
        for f, what in (dec, ‘Before:’), (inc, ‘After:’):
            i = user.id
            while True:
                i = f(i)
                u = getUser(i)
                if u:
                    print what, u.screen_name
                    break
    else:
        print ‘Could not find user %r’ % name

if __name__ == ‘__main__’:
    for name in sys.argv[1:]:
        do(name)
 

I’m happy to have reached the point in my Python development where I can pretty much just type something like this in without really having to think, including the use of operator.add and functools.partial.

BTW, the users who signed up immediately before and after I did were skywalker and kitu012.

The above is just a hack. Notes:

  1. If it can’t retrieve a user for any reason, it just assumes there is no such user.
  2. Twitter periodically deletes accounts of abusers, so the answer will skip those.
  3. Twitter had lots of early hiccups, so there may be no guarantee that user ids were actually assigned sequentially.
  4. This script may run forever.
  5. I’m using the Python Twitter library written by DeWitt Clinton. It’s been a while since it was updated, and it doesn’t give you back the time a user was created in Twitter. It would be fun to print that too.

As you were.

10,000 things: Andrew Hensel lives (on Twitter)

Monday, January 5th, 2009

Andrew Hensel was an extraordinary human being.

We were graduate students together at The University of Waterloo in Canada in 1986-88. I met him on my first day there and we spent many hours together on a daily basis over the next 2.5 years. I don’t want to try to say too much about him now. It occurred to me a few days ago that I might post a few stories here. We did lots of crazy things. At one point I had wanted to write something titled “100 things to a Hensel” and I made a bunch of notes, but it went no further.

I wrote about him in my Ph.D. acknowledgments in 1995:

Andrew Hensel, with whom I shared so much of my two and a half years at Waterloo, was the most original and creative person I have ever known well. Together, we dismantled the world and rebuilt it on our own crazy terms. We lived life at a million miles an hour and there was nothing like it. Five years ago, Andrew killed himself. There have been few days since then that I have not thought of him and the time we spent together.

I still think about him frequently. Today I was remembering one of his many, many oddball projects (most of which went unfinished), which he called “10,000 things”. It was to be a list of 10,000 things that he thought of. By the time he started sending them to me we had both dropped out of Waterloo. He was back in Australia and I was in Munich.

He only sent me 300 of the to-be 10,000. Of course I still have them. They’re all very short. At the risk of being thought macabre I’ve decided to bring Andrew back a very little and post them to Twitter, chosen at random, one a day. You can follow adhensel to get just a glimpse of his mind. The first tweet, people being planted into earth, is already up.

There are at least half a dozen twitterers who knew Andrew, including one who knew him probably better than anybody. Once in a while I get email from someone who finds my online mentions of him. Invariably they also found him extraordinary.

What would Andrew have made of Twitter? I have no doubt at all that he’d have immediately dismissed it as “weak”. That was one of his favorite adjectives. Almost everything was weak. It’s a small miracle to me to partly bring him back to life 18 years after he died, by posting just some of his 10,000 things to Twitter.

And… my apologies to anyone who knew Andrew and who finds this upsetting.

Twittendipity: a chance interview with Robert Scoble

Thursday, December 4th, 2008

On Monday Tim O’Reilly posted a Twitter tweet suggesting to Robert Scoble that he contact me while in Barcelona.

First off, Tim is very generous in doing this. He’s ultra connected and he spends a significant amount of his time in Twitter pointing things out, connecting people, and re-tweeting stuff he finds interesting. Re-tweeting is really important because when you tweet you only reach the people who are already following you. But when someone re-tweets you, you reach new people who likely have no idea of your existence. And when Tim does the re-tweeting there can be a big impact. 24 hours after his message to Robert I had 50 new followers. Tim explicitly tries to help people doing things he finds interesting, but who have just a small number of Twitter followers. He filters and amplifies information, broadcasting it out to his 16,000+ followers. Robert was in a hotel about 10 minutes’ walk from my place and I had no idea. A mutual friend in California noticed and took a minute to connect us. That’s really something, and it perfectly illustrates some of the value of Twitter.

I met Robert yesterday afternoon and we spent 6 hours together. It was great. You can see at once why he’s been so successful: he’s smart, he’s thoughtful, he’s sympathetic, and he’s a careful listener. I had no idea what to expect, and seeing as what we’re building can take some time to sink in, I wondered what sort of an audience he’d be.

After we’d climbed around up in the Sagrada Familia (official site, wikipedia), Robert came back to my place to see a demo of the things I’d been describing. We sat down and he pulled out his cell phone and asked if he could film me. I didn’t really think about it and said of course. It didn’t dawn on me that we were doing an informal interview, and I was totally unprepared – which is probably a good thing.

In the end we filmed 4 segments: parts one, two, three, and four. There’s also been some discussion here on Robert’s FriendFeed page.

So if you’ve been wondering what we’re building in here, go watch the videos.

I had no idea all this was about to come down. The Fluidinfo web site (a generous word) was a single page with no contact information, no nuthin’. We simply haven’t needed a web site of any description yet. I went and added a box so you can sign up to receive news of the alpha launch.

And then there was this, posted on Twitter, and which I have absolutely no shame in reproducing (this is a blog, after all):

Wow, what @terrycojones showed me last night (a new kind of database that he’s been workng on for 11 years) blew me away. Uploading vids now

Now I have to put my head back down with Esteve to get the alpha out the door ASAP.

Changing POV under Twitter

Wednesday, November 26th, 2008

One thing I’d like to be able to do in Twitter is change my point of view. That is, see what Twitter looks like from the POV of another user.

Given Twitter’s asymmetric follower model and the prevalence of @ messaging, it’s very common to run across a fragment of a conversation that seems potentially interesting. It’s also common not to be following the full set of people who are interacting.

For example, four people might be exchanging tweets on a subject, and you may follow just one of them. So you’ll see roughly one quarter of the thread. Right now, to get the context for the discussion you need to go take a look at the archives of the various people and try to piece the conversation together. You have to do this one tweeter at a time. Or you could temporarily follow the people involved and then page backwards through time to see the flow of tweets. With some work on the server side, Twitter could let you see this using the Twitter search interface (you’d need to put in the names of the various parties though).

It would be much simpler and much cooler to just to click a link besides a user’s name and get that user’s POV. You’d see what they see, except for the people whose tweets are private and which you’re prevented from seeing by the Twitter permission system. Not only could you see more or all of a conversation, I bet it would be really interesting to see Twitter from someone else’s POV. You could click on the @Replies tab to see all replies to that user, etc. There’s no reason why not – it’s all public data, and you can easily fetch the @replies using the search interface. I think wandering around inside the Twitterverse jumping from the POV of one identity to another would be fascinating. It reminds me of wandering around inside the wayback machine, except it’s the present.

That would all be pretty easy to implement, even for a 3rd party using the Twitter API. It would be nice if Twitter were to implement it themselves. I could do the basics myself in a few hours, but I’d rather not. This is also something that could be accessed via a Firefox extension or Greasemonkey – install it and get an extra button next to every tweet. The button switches you to the POV of the tweeter.

All we need is someone to build it.

I have several more Twitter blog posts I’d love to write. The most interesting, to me, is all about evolutionary biology, sex, and the meaning of life itself. But no time, no time. I’ve finally added a Twitter category to this blog, and was surprised to find 14 posts that fit it. Am I obsessed?

As usual, make sure you follow me :-)

Passion and the creation of highly non-uniform value

Monday, November 10th, 2008

Here, finally, are some thoughts on the creation of value. I don’t plan to do as good a job as the subject merits, but if I don’t take a rough stab at it, it’ll never happen.

I’ll first explain what I mean by “the creation of highly non-uniform value”. I’m talking about ideas that create a lot of (monetary) value for a very small number of people. If you made a graph and on the X axis put all the people in the world, in sorted order of how much they make from an idea, and on the Y axis you put value they each receive, we’re talking about distributions that look like the image on the right.

In other words, a setting in which a very small number of people try to get extremely rich. I.e., startup founders, a few key employees, their investors, and their investors’ investors. BTW, I don’t want to talk about the moral side of this, if there is one. There’s nothing to stop the obscenely rich from giving their money away or doing other charitable things with it.

So let’s just accept that many startup founders, and (in theory) all venture investors, are interested in turning ideas into wealth distributions that look like the above.

I was partly beaten to the punch on this post by Paul Graham in his essay Why There Aren’t More Googles? Paul focused on VC caution, and with justification. But there’s another important part of the answer.

One of the most fascinating things I’ve heard in the last couple of years is an anecdote about the early Google. I wrote about it in an earlier article, The blind leading the blind:

…the Google guys were apparently running around search engine companies trying to sell their idea (vision? early startup?) for $1M. They couldn’t find a buyer. What an extraordinary lack of.. what? On the one hand you want to laugh at those idiot companies (and VCs) who couldn’t see the huge value. OK, maybe. But the more extraordinary thing is that Larry Page and Sergei Brin couldn’t see it either! That’s pretty amazing when you think about it. Even the entrepreneurs couldn’t see the enormous value. They somehow decided that $1M would be an acceptable deal. Talk about a lack of vision and belief.

So you can’t really blame the poor VCs or others who fail to invest. If the founding tech people can’t see the value and don’t believe, who else is going to?

I went on to talk about what seemed like it might be a necessary connection between risk and value.


Image: Lost Tulsa

Following on…

After more thought, I’m now fairly convinced that I was on the right track in that post.

It seems to me that the degree to which a highly non-uniform wealth distribution can be created from an idea depends heavily on how non-obvious the value of the idea is.

If an idea is obviously valuable, I don’t think it can create highly non-uniform wealth. That’s not to say that it can’t create vast wealth, just that the distribution of that wealth will be more widely spread. Why is that the case? I think it’s true simply because the value will be apparent to many people, there will be multiple implementations, and the value created will be spread more widely. If the value of an idea is clear, others will be building it even as you do. You might all be very successful, but the distribution of created value will be more uniform.

Obviously it probably helps if an idea is hard to implement too, or if you have some other barrier to entry (e.g., patents) or create a barrier to adoption (e.g., users getting positive reinforcement from using the same implementation).

I don’t mean to say that an idea must be uniquely brilliant, or even new, to generate this kind of wealth distribution. But it needs to be the kind of proposition that many people look at and think “that’ll never work.” Even better if potential competitors continue to say that 6 months after launch and there’s only gradual adoption. Who can say when something is going to take off wildly? No-one. There are highly successful non-new ideas, like the iPod or YouTube. Their timing and implementation were somehow right. They created massive wealth (highly non-uniformly distributed in the case of YouTube), and yet many people wrote them off early on. It certainly wasn’t plain sailing for the YouTube founders – early adoption was extremely slow. Might Twitter, a pet favorite (go on, follow me), create massive value? Might Mahalo? Many people would have found that idea ludicrous 1-2 years ago – but that’s precisely the point. Google is certainly a good example – search was supposedly “done” in 1998 or so. We had Alta Vista, and it seemed great. Who would’ve put money into two guys building a search engine? Very few people.

If it had been obvious the Google guys were doing something immensely valuable, things would have been very different. But they traveled around to various companies (I don’t have this first hand, so I’m imagining), showing a demo of the product that would eventually create $100-150B in value. It wasn’t clear to anyone that there was anything like that value there. Apparently no-one thought it would be worth significantly more that $1M.

I’ve come to the rough conclusion that that sort of near-universal rejection might be necessary to create that sort of highly non-uniform wealth distribution.

There are important related lessons to be learned along these lines from books like The Structure of Scientific Revolutions and The Innovator’s Dilemma.

Now back to Paul’s question: Why aren’t there more Googles?

Part of the answer has to be that value is non-obvious. Given the above, I’d be willing to argue (over beer, anyway) that that’s almost by definition.

So if value is non-obvious, even to the founders, how on earth do things like this get created?

The answer is passion. If you don’t have entrepreneurs who are building things just from sheer driving passion, then hard projects that require serious energy, sacrifice, and risk-taking, simply wont be built.

As a corollary, big companies are unlikely to build these things – because management is constantly trying to assess value. That’s one reason to rue the demise of industrial research, and a reason to hope that cultures that encourage people to work on whatever they want (e.g., Google, Microsoft research) might be able to one day stumble across this kind of value.

This gets me to a recent posting by Tim Bray, which encourages people to work on things they care about.

It’s not enough just to have entrepreneurs who are trying to create value. As I’m trying to say, practically no-one can consistently and accurately predict where undiscovered value lies (some would argue that Marc Andreessen is an exception). If it were generally possible to do so, the world would be a very different place – the whole startup scene and venture/angel funding system would be different, supposing they even existed. Even if it looks like a VC or entrepreneur can infallibly put their finger on undiscovered value, they probably can’t. One-time successful VCs and entrepreneurs go on to attract disproportionately many great companies, employees, funding, etc., the next time round. You can’t properly separate their raw ability to see undiscovered value from the strong bias towards excellence in the opportunities they are later afforded. Successful entrepreneurs are often refreshingly and encouragingly frank about the role of luck in their success. They’re done. VCs are much less sanguine – they’re supposed to have natural talent, they’re trying to manufacture the impression that they know what they’re doing. They have to do that in order to get their limited partners to invest in their funds. For all their vaunted insight, roughly only 25% of VCs provide returns that are better than the market. The percentage generating huge returns will of course be much smaller, as in turn will be those doing so consistently. I reckon the whole thing’s a giant crap shoot. We may as well all admit it.

I have lots of other comments I could make about VCs, but I’ll restrict myself to just one as it connects back to Paul’s article.

VCs who claim to be interested in investing in the next Google cannot possibly have the next Google in their portfolio unless they have a company whose fundamental idea looks like it’s unlikely to pan out. That doesn’t mean VCs should invest in bad ideas. It means that unless VCs make bets on ideas that look really good – but which are e.g., clearly going to be hard to build, will need huge adoption to work, appear to be very risky long-shots, etc. – then they can’t be sitting on the next Google. It also doesn’t mean VCs must place big bets on stuff that’s highly risky. A few hundred thousand can go a long way in a frugal startup.

I think this is a fundamental tradeoff. You’ll very frequently hear VCs talk about how they’re looking for companies that are going to create massive value (non-uniformly distributed, naturally), with massive markets, etc. I think that’s pie in the sky posturing unless they’ve already invested in, or are willing to invest in, things that look very risky. That should be understood. And so a question to VCs from entrepreneurs and limited partners alike: if you claim to be aiming to make massive returns, where are your necessary correspondingly massively risky investments? Chances are you wont find any.

There is a movement in the startup investment world towards smaller funds that make smaller investments earlier. I believe this movement is unrelated to my claim about non-obviousness and highly non-uniform returns. The trend is fuelled by the realization that lots of web companies are getting going without the need for traditional levels of financing. If you don’t get in early with them, you’re not going to get in at all. A big fund can’t make (many) small investments, because their partners can’t monitor more than a handful of companies. So funds that want to play in this area are necessarily smaller. I think that makes a lot of sense. A perhaps unanticipated side effect of this is that things that look like they may be of less value end up getting small amounts of funding. But on the whole I don’t think there’s a conscious effort in that direction – investors are strongly driven to select the least risky investment opportunities from the huge number of deals they see. After all, their jobs are on the line. You can’t expect them to take big risks. But by the same token you should probably ignore any talk of “looking for the next Google”. They talk that way, but they don’t invest that way.

Finally, if you’re working on something that’s being widely rejected or whose value is being widely questioned, don’t lose heart (instead go read my earlier posting) and don’t waste your time talking to VCs. Unless they’re exceptional and serious about creating massive non-uniformly distributed value, and they understand what that involves, they certainly wont bite.

Instead, follow your passion. Build your dream and get it out there. Let the value take care of itself, supposing it’s even there. If you can’t predict value, you may as well do something you really enjoy.

Now I’m working hard to follow my own advice.

I had to learn all this the hard way. I spent much of 2008 on the road trying to get people to invest in Fluidinfo, without success. If you’re interested to know a little more, earlier tonight I wrote a Brief history of an idea to give context for this posting.

That’s it for now. Blogging is a luxury I can’t afford right now, not that I would presume to try to predict which way value lies.

Twitter’s amazing stickiness (with a caveat)

Friday, October 31st, 2008

I just followed a link to a site that shows the date of the first tweet of 50 early Twitter users. I wondered how many of these early users were still active users, and guessed many would be.

Instead of going and fetching each user’s last tweet by hand, I wrote a little shell script to do all the work:

for name in \
  `curl -s http://myfirsttweet.com/oldest.php |
   perl -p -e ‘s,<a href="http://myfirsttweet.com/1st/(\w+)">,\nNAME:\t$1\n,g’ |
   egrep ‘^NAME:’ |
   cut -f2 |
   uniq`
do
    echo $name \
      `curl -s "http://twitter.com/statuses/user_timeline/$name.xml?count=1" |
       grep created_at |
       cut -f2 -d\> |
       cut -f1 -d\<`
done
 

Who wouldn’t want to be a (UNIX) programmer!?

And the output, massaged into an HTML table:

User Last tweeted on
jack Thu Oct 30 03:41:49 +0000 2008
biz Thu Oct 30 22:24:12 +0000 2008
Noah Tue Oct 28 22:56:15 +0000 2008
adam Thu Oct 30 21:34:56 +0000 2008
tonystubblebine Fri Oct 31 00:53:38 +0000 2008
dom Thu Oct 30 20:36:31 +0000 2008
rabble Fri Oct 31 00:56:28 +0000 2008
kellan Fri Oct 31 00:32:44 +0000 2008
sarahm Thu Oct 30 22:45:37 +0000 2008
dunstan Thu Oct 30 23:59:57 +0000 2008
stevej Fri Oct 31 00:12:03 +0000 2008
lemonodor Thu Oct 30 18:21:43 +0000 2008
blaine Wed Oct 29 23:52:06 +0000 2008
rael Fri Oct 31 01:02:58 +0000 2008
bob Fri Oct 31 00:39:18 +0000 2008
graysky Fri Oct 31 00:23:21 +0000 2008
veen Thu Oct 30 19:47:40 +0000 2008
dens Fri Oct 31 00:13:12 +0000 2008
heyitsnoah Thu Oct 30 20:09:35 +0000 2008
rodbegbie Thu Oct 30 23:42:39 +0000 2008
astroboy Thu Oct 30 22:07:50 +0000 2008
alba Thu Oct 30 16:06:29 +0000 2008
kareem Thu Oct 30 20:20:14 +0000 2008
gavin Thu Oct 30 17:48:45 +0000 2008
nick Fri Oct 31 01:17:29 +0000 2008
psi Thu Oct 30 20:40:53 +0000 2008
vertex Fri Oct 31 00:44:09 +0000 2008
mulegirl Fri Oct 31 00:31:05 +0000 2008
thedaniel Thu Oct 30 20:00:31 +0000 2008
myles Thu Oct 30 15:50:31 +0000 2008
mike ftw Fri Oct 31 00:28:00 +0000 2008
stumblepeach Thu Oct 30 23:20:06 +0000 2008
bunch Sat Oct 25 20:46:42 +0000 2008
adamgiles com Thu Apr 10 17:22:52 +0000 2008
naveen Thu Oct 30 23:24:23 +0000 2008
nph Fri Oct 31 01:53:13 +0000 2008
caterina Tue Oct 28 18:07:32 +0000 2008
rafer Thu Oct 30 19:23:50 +0000 2008
ML Thu Oct 30 15:31:47 +0000 2008
brianoberkirch Thu Oct 30 20:21:43 +0000 2008
joelaz Thu Oct 30 22:03:59 +0000 2008
arainert Fri Oct 31 01:18:43 +0000 2008
tony Sun Oct 26 18:16:02 +0000 2008
brianr Fri Oct 31 01:57:27 +0000 2008
prash Tue Oct 28 22:14:24 +0000 2008
danielmorrison Thu Oct 30 21:37:41 +0000 2008
slack Fri Oct 31 01:26:08 +0000 2008
mike9r Thu Oct 30 21:17:29 +0000 2008
monstro Thu Oct 30 22:28:46 +0000 2008
mat Fri Oct 31 00:26:22 +0000 2008

Wow… look at those dates. Only one of these people has failed to update in the last week!

Here’s the caveat. We don’t know how many early Twitter users are in the My First Tweet database. The data looks suspicious: there are only 50 Twitter users in a 7 month period? That can’t be right. So it’s possible the My First Tweet database is built by finding currently active tweeters and then looking back to their first post. If so, my table doesn’t say much about stickiness.

But I find it fairly impressive in any case.

Digging into Twitter following

Monday, October 13th, 2008

TwitterThis is just a quick post. I have a ton of things I could say about this, but they’ll have to wait – I need to do some real work.

Last night and today I wrote some Python code to dig into the follower and following sets of Twitter users.

I also think I understand better why Twitter is so compelling, but that’s going to have to wait for now too.

You give my program some Twitter user names and it builds you a table showing numbers of followers, following etc. for each user. It distinguishes between people you follow and who don’t follow you, and people who follow you but whom you don’t follow back.

But the really interesting thing is to look at the intersection of some of these sets between users.

For example, if I follow X and they don’t follow me back, we can assume I have some interest in X. So if am later followed by Y and it turns out that X follows Y, I might be interested to know that. I might want to follow Y back just because I know it might bring me to the attention of X, who may then follow me. If I follow Y, I might want to publicly @ message him/her, hoping that he/she might @ message me back, and that X may see it and follow me.

Stuff like that. If you think that sort of thing isn’t important, or is too detailed or introspective, I’ll warrant you don’t know much about primate social studies. But more on that in another posting too.

As another example use, I plan to forward the mails Twitter sends me telling me someone new is following me into a variant of my program. It can examine the sets of interest and weight them. That can give me an automated recommendation of whether I should follow that person back – or just do the following for me.

There are lots of directions you could push this in, like considering who the person had @ talked to (and whether those people were followers or not) and the content of their Tweets (e.g., do they talk about things I’m interested or not interested in?).

Lots.

For now, here are links to a few sample runs. Apologies to the Twitter users I’ve picked on – you guys were on my screen or on my mind (following FOWA).

I’d love to turn these into nice Euler Diagrams but I didn’t find any decent open source package to produce them.

I’m also hoping someone else (or other people) will pick this up and run with it. I’ve got no time for it! I’m happy to send the source code to anyone who wants it. Just follow me on Twitter and ask for it.

Example 1: littleidea compared to sarawinge.
Example 2: swardley compared to voidspace.
Example 3: aweissman compared to johnborthwick.

And finally here’s the result for deWitt, on whose Twitter Python library I based my own code. This is the output you get from the program when you only give it one user to examine.

More soon, I guess.

How many users does Twitter have?

Monday, October 13th, 2008

Inclusion/Exclusion

Here’s a short summary of a failed experiment using the Principle of Inclusion/Exclusion to estimate how many users Twitter has. I.e., there’s no answer below, just the outline of some quick coding.

I was wondering about this over cereal this morning. I know some folks at Twitter, and I know some folks who have access to the full tweet database, so I could perhaps get that answer just by asking. But that wouldn’t be any fun, and I probably couldn’t blog about it.

I was at FOWA last week and it seemed that absolutely everyone was on Twitter. Plus, they were active users, not people who’d created an account and didn’t use it. If Twitter’s usage pattern looks anything like a Power Law as we might expect, there will be many, many inactive or dormant accounts for every one that’s moderately active.

BTW, I’m terrycojones on Twitter. Follow me please, I’m trying to catch Jason Calacanis.

You could have a crack at answering the question by looking at Twitter user id numbers via the API and trying to estimate how many users there are. I did play with that at one point at least with tweet ids, but although they increase there are large holes in the tweet id space. And approaches like that have to go through the Twitter API, which limits you to a mere 70 requests per hour – not enough for any serious (and quick) probing.

In any case, I was looking at the Twitter Find People page. Go to the Search tab and you can search for users.

I searched for the single letter A, and got around 109K hits. That lead me to think that I could get a bound on Twitter’s size using the Principle of Inclusion/Exclusion (PIE). (If you don’t know what that is, don’t be intimidated by the math – it’s actually very simple, just consider the cases of counting the size of the union of 2 and 3 sets). The PIE is a beautiful and extremely useful tool in combinatorics and probability theory (some nice examples can be found in Chapter 3 of the introductory text Applied Combinatorics With Problem Solving). The image above comes from the Wikipedia page.

To get an idea of how many Twitter users there are, we can add the number of people with an A in their name to the number with a B in their name, …., to the number with a Z in their name.

That will give us an over-estimate though, as names typically have many letters in them. So we’ll be counting users multiple times in this simplistic sum. That’s where the PIE comes in. The basic idea is that you add the size of a bunch of sets, and then you subtract off the sizes of all the pairwise intersections. Then you add on the sizes of all the triple set intersections, and so on. If you keep going, you get the answer exactly. If you stop along the way you’ll have an upper or lower bound.

So I figured I could add the size of all the single-letter searches and then adjust that downwards using some simple estimates of letter co-occurrence.

That would definitely work.

But then the theory ran full into the reality of Twitter.

To begin with, Twitter gives zero results if you search for S or T. I have no idea why. It gives a result for all other (English) letters. My only theory was that Twitter had anticipated my effort and the missing S and T results were their way of saying Stop That!

Anyway, I put the values for the 24 letters that do work into a Python program and summed them:

count = dict(a = 108938,
             b =  12636,
             c =  13165,
             d =  21516,
             e =  14070,
             f =   5294,
             g =   8425,
             h =   7108,
             i = 160592,
             j =   9226,
             k =  12524,
             l =   8112,
             m =  51721,
             n =  11019,
             o =   9840,
             p =   8139,
             q =   1938,
             r =  10993,
             s =      0,
             t =      0,
             u =   8997,
             v =   4342,
             w =   6834,
             x =   8829,
             y =   8428,
             z =   3245)

upperBoundOnUsers = sum(count.values())
print ‘Upper bound on number of users:’, upperBoundOnUsers

The total was 515,931.

Remember that that’s a big over-estimate due to duplicate counting.

And unless I really do live in a tech bubble, I think that number is way too small – even without adjusting it using the PIE.

(If we were going to adjust it, we could try to estimate how often pairs of letters co-occur in Twitter user names. That would be difficult as user names are not like normal words. But we could try.)

Looking at the letter frequencies, I found them really strange. I wrote a tiny bit more code, using the English letter frequencies as given on Wikipedia to estimate how many hits I’d have gotten back on a normal set of words. If we assume Twitter user names have an average length of 7, we can print the expected numbers versus the actual numbers like this:

# From http://en.wikipedia.org/wiki/Letter_frequencies
freq = dict(a = 0.08167,
            b = 0.01492,
            c = 0.02782,
            d = 0.04253,
            e = 0.12702,
            f = 0.02228,
            g = 0.02015,
            h = 0.06094,
            i = 0.06966,
            j = 0.00153,
            k = 0.00772,
            l = 0.04025,
            m = 0.02406,
            n = 0.06749,
            o = 0.07507,
            p = 0.01929,
            q = 0.00095,
            r = 0.05987,
            s = 0.06327,
            t = 0.09056,
            u = 0.02758,
            v = 0.00978,
            w = 0.02360,
            x = 0.00150,
            y = 0.01974,
            z = 0.00074)

estimatedUserNameLen = 7

for L in sorted(count.keys()):
    probNotLetter = 1.0 – freq[L]
    probOneOrMore = 1.0 – probNotLetter ** estimatedUserNameLen
    expected = int(upperBoundOnUsers * probOneOrMore)
    print "%s: expected %6d, saw %6d." % (L, expected, count[L])

Which results in:

a: expected 231757, saw 108938.
b: expected  51531, saw  12636.
c: expected  92465, saw  13165.
d: expected 135331, saw  21516.
e: expected 316578, saw  14070.
f: expected  75281, saw   5294.
g: expected  68517, saw   8425.
h: expected 183696, saw   7108.
i: expected 204699, saw 160592.
j: expected   5500, saw   9226.
k: expected  27243, saw  12524.
l: expected 128942, saw   8112.
m: expected  80866, saw  51721.
n: expected 199582, saw  11019.
o: expected 217149, saw   9840.
p: expected  65761, saw   8139.
q: expected   3421, saw   1938.
r: expected 181037, saw  10993.
s: expected 189423, saw      0.
t: expected 250464, saw      0.
u: expected  91732, saw   8997.
v: expected  34301, saw   4342.
w: expected  79429, saw   6834.
x: expected   5392, saw   8829.
y: expected  67205, saw   8428.
z: expected   2666, saw   3245.

You can see there are wild differences here.

While it’s clearly not right to be multiplying the probability of one or more of each letter appearing in a name by the 515,931 figure (because that’s a major over-estimate), you might hope that the results would be more consistent and tell you how much of an over-estimate it was. But the results are all over the place.

I briefly considered writing some code to scrape the search results and calculate the co-occurrence frequencies (and the actual set of letters in user names). Then I noticed that the results don’t always add up. E.g., search for C and you’re told there are 13,190 results. But the results come 19 at a time and there are 660 pages of results (and 19 * 660 = 12,540, which is not 13,190).

At that point I decided not to trust Twitter’s results and to call it quits.

A promising direction (and blog post) had fizzled out. I was reminded of trying to use AltaVista to compute co-citation distances between web pages back in 1996. AltaVista was highly variable in its search results, which made it hard to do mathematics.

I’m blogging this as a way to stop thinking about this question and to see if someone else wants to push on it, or email me the answer. Doing the above only took about 10-15 mins. Blogging it took at least a couple of hours :-(

Finally, in case it’s not clear there are lots of assumptions in what I did. Some of them:

  • We’re not considering non-English letters (or things like underscores, which are common) in user names.
  • The mean length of Twitter user names is probably not 7.
  • Twitter search returns user names that don’t contain the searched-for letter (instead, the letter appears in the user’s name, not the username).

Random thoughts on Twitter

Monday, June 9th, 2008

TwitterI’ve spent a lot of time thinking about Twitter this year. Here are a few thoughts at random.

Obviously Twitter have tapped into something quite fundamental, which at a high level we might simply call human sociability. We humans are primates, though there’s a remarkably strong tendency to forget or ignore this. We know a lot about the intensely social lives of our fellow primate species. It shouldn’t come as a surprise that we like to Twitter amongst ourselves too.

Here are a couple of interesting (to me) reasons for the popularity of Twitter.

One is that many people are in some sense atomized by the fact that many of us now work in an isolated way. Technical people who can do their work and communicate over the internet probably see less of their peers than others do. That’s just a general point, it’s not specific to Twitter or to 2008. It would have seemed unfathomably odd to humans 50 years ago to hear that many of us would be doing a large percentage of our work and social communication via machines, interacting with people who we don’t otherwise know, and who we rarely or never meet face to face. The rise of internet-based communication is obviously(?) helping to fill a gap created by this generational change.

The second point is specific to Twitter. Through brilliance or accident, the form of communication on Twitter is really special. Building a social network on nothing-implied asymmetric follower relationships is not something I would have predicted as leading to success. Maybe it worked, or could have all gone wrong, just due to random chance. But I’m inclined to believe that there’s more to it than that. Perhaps we’re all secretly voyeurs, or stickybeaks (nosy-parkers). Perhaps we like to see one half of conversations and be able to follow along if we like. Perhaps there’s a small secret thrill to promiscuously following someone and seeing if they follow you back. I don’t know the answer, but as I said above I do think Twitter have tapped into something interesting and strong here. There’s a property of us, we simple primates, that the Twitter model has managed to latch onto.

I think Twitter should change the dynamics for new users by initially assigning them ten random followers. New users can easily follow others, but if no-one is following them….. why bother? New user uptake would be much higher if they didn’t have the (correct) feeling that they were for some reason expected to want to Twitter in a vacuum. You announce a new program, called e.g., Twitter Guides and ask for people to volunteer to be guides (i.e., followers) of newbees. Lend a hand, make new friends, maybe get some followers yourself, etc. Lots of people would click to be a Guide. I bet this would change Twitter’s adoption dynamics. If you study things like random graph theory and dynamic systems, you know that making small changes to (especially initial) probabilities can have a dramatic effect on overall structure. If Twitter is eventually to reach a mass audience (whatever that means), it should be an uncontestable assertion that anything which significantly reduces the difficulty for new users to get into using it is very important.

Twitter should probably fix their reliability issues sometime soon.

I say “probably” because reliability and scaling are obviously not the most important things. Twitter has great value. It must have, or it would have lost its users long ago.

There’s a positive side to Twitter’s unreliability. People are amazed that the site goes down so often. Twitter gets snarled up in ways that give rise to a wide variety of symptoms. The result seems to be more attention, to make the service somehow more charming. It’s like a bad movie that you remember long afterwards because it wasn’t good. We don’t take Twitter for granted and move on the next service to pop up – we’re all busy standing around making snide remarks, playing armchair engineer, knowing that we too might face some of these issues, and talking, talking, talking. Twitter is a fascinating sight. Great harm is done by its unreliability, but the fact that their success so completely flies in the face of conventional wisdom is fascinating – and the fact that we find it so interesting and compelling a spectacle is fantastic for Twitter. They can fix the scaling issues, I hope. They should prove temporary. But the human side of Twitter, its character as a site, the site we stuck with and rooted for when times were so tough, the amazing little site that dropped to the canvas umpteen times but always got back to its feet, etc…. All that is permanent. If Twitter make it, they’re going to be more than just a web service. The public outages are like a rock musician or movie star doing something outrageous or threatening suicide – capturing attention. We’re drawn to the spectacle and the drama. We can’t help ourselves: it is our selves. We love it, we hate it, it brings us together to gnash our teeth when it’s down. But do we leave? Change the channel? No way.

Twitter is both the temperamental child rock star we love and, often, the medium by which we discuss it – an enviable position!

I’m reminded of a trick I learned during tens of thousands of miles of hitch-hiking. A great place to try for a lift is on a fairly high-speed curve on the on-ramp to the freeway / motorway / autopista / autoroute etc. Stand somewhere where a speeding car can only just manage a stop and only just manage to pull in away from the following traffic. Conventional wisdom tells you that you’ll never get a ride. But the opposite is true – you’ll get a ride extremely quickly. Invariably, the first thing the driver says when you get in is “Why on earth where you standing there? You’re very lucky I managed to stop. No-one would have ever picked you up standing there!” I’ve done this dozens of times. Twitter—being incredibly, unbelievably, frustratingly, unreliable and running contrary to all received wisdom—is a powerful spectacle. Human psyche is a funny thing. That’s a part of why it’s probably impossible to foretell success when mass adoption is required.

If I were running Twitter, apart from working to get the service to be more reliable, I’d be telling the engineering team to log everything. There’s a ton of value in the data flowing into Twitter.

Just as Google took internet search to a new level by link analysis, there’s another level of value in Twitter that I don’t think has really begun to be tapped yet.

PageRank, at least as I understand its early operation, ran a kind of iterative relaxation algorithm assigning and passing on credit via linked pages. A similar thing is clearly possible with Twitter, and some people have commented on this or tried to build little things that assign some form of score to users. But I think there’s a lot more that can be done. Because the Twitter API isn’t that powerful (mainly because you’re largely limited to querying as a single authorized user) and certainly because it’s rate-limited to just 70 API calls an hour, this sort of analysis will need to be done by Twitter themselves. I’m sure they’re well aware of that. Rate limiting probably helps them stay up, but it also means that the truly interesting and valuable stuff can’t be done by outsiders. I have no beef with that – I just wish Twitter would hurry up and do some of it.

Some examples in no order:

  • The followers to following ratio of a Twitter user is obviously a high-level measure of that user’s “importance” (in some Twitter sense of importance). But there’s more to it than that. Who are the followers? Who do they follow, who follows them? Etc. This leads immediately back to Google PageRank.
  • If a user gets followed by many people and doesn’t follow those people back, what does it say about the people involved? If X follows Y and Y then goes to look at a few pages of X’s history but does not then follow X, what do we know?
  • If X has 5K followers and re-tweets a twit of Y, how many of X’s followers go check out and perhaps follow Y? What kind of people are these? (How do you advertise to them, versus others?)
  • Along the lines of co-citation analysis, Twitter could build up a map showing you who you might follow. I.e., you can get pairwise distances between users X and Y by considering how many people they follow in common and how many they follow not-in-common. That would lead to a people you should be following that you’re not kind of suggestion.
  • Even without co-citation analysis (or similar), Twitter should be able to tell me about people that many of the people I follow are following but whom I am not following. I’d find that very useful.
  • Twitter could tell me why someone chooses to follow me. What were they looking at (if anything) before they decided to follow me? I.e., were they browsing the following list of someone else? Did they see my user name mentioned in a Tweet? Did they come in from an outside link? Would a premium Twitter user pay to have that information?
  • Twitter has tons of links. They know the news as it happens. They could easily create a news site like Digg.
  • In some sense the long tail of Twitter is where the value is. For instance, it doesn’t mean much if a user following 10K others follows someone. But if someone is following just 10 people, it’s much more significant. There’s more information there (probably). The Twitter mega users are in some way uninteresting – the more people they have following them and the more they follow, the less you really know (or care) about them. Yes, you could probably figure out more if you really wanted to, but if someone has 10K followers all you really know is that they’re probably famous in some way. If they add another 100 followers it’s no big deal. (I say all this a bit lightly and generally – the details might of course be fascinating and revealing – e.g., if you notice Jason Calacanis and Dave Winer have suddenly started @ messaging each other again it’s like IRC coming back from a network split :-))
  • Similarly if someone with a very high followers to following ratio follows a Twitter user who has just a couple of followers, it’s a safe bet that those two are somehow friends with a pre-existing relationship.
  • I bet you could do a pretty good job of putting Twitter users into boxes just based on their overall behavior, something like the 16 Myers-Briggs categories. Do you follow people back when they follow you? Do you @ answer people who @ address you (and Twitter knows when you’ve seen the original message)? Do you send @ messages to people (and how influential are those people)? Do those people @ you back (and how influential those people are says something about how interesting / provocative you are)? Do you follow tons and tons of people? Do you follow people and then un-follow them if they don’t follow you back? Do you follow random links in other people’s Twitters, and are those links accompanied by descriptive text or tinyurl links? Do you @ message people after you follow their links? Do your Twitter times follow a strict pattern, or are you on at all hours, or suddenly spending days without Twittering? Do you visit and just read much more than you tweet? How much old stuff do you read? Do you tend to talk in public or via DM? Are your tweets public?All that without even considering the content of your Twitters.
  • Could Twitter become a search engine? That’s not a 100% serious question, but it’s worth considering. I don’t mean just making the content of all tweet searchable, I mean it with some sort of ranking algorithm, again perhaps akin to PageRank. If you somehow rank results by the importance or closeness of the user whose tweets match the search terms, you might have something interesting.
  • Twitter also presumably know who’s talking about whom in the DM backchat. They can’t use that information in obvious way, but it’s of high value.

I could go on for hours, but that’s more than enough for now. I don’t feel like any of the above list is particularly compelling, but I do think the list of nice things they could be doing is extremely long and that Twitter have only just begun (at least publicly) to tap into the value they’re sitting on.

I think Google should buy Twitter. They have what Twitter needs: 1) engineering and scale, 2) link analysis and algorithm brilliance, and 3) they’re in a position to monetize the value illustrated above (via their search engine, that already has ads) without pissing off the Twitter community by e.g., running ads on Twitter. What percentage of Twitter users also use Google? I bet it’s very high.

Twitter dynamics: unfollowing guykawasaki, Scobleizer and cameronreilly

Saturday, March 22nd, 2008

cameronreillyI’ve only got so much time a day to read blogs, Twitters, etc.

With blogs I find that I tend to try to keep up with those that post at a frequency at or below what I can handle, irrespective of quality of content. There are lots of blogs that I really enjoy, but which post new material so often that I end up never going to their sites. E.g., BoingBoing or ReadWriteWeb. I tend to always go to new content at blogs I like that have about one new article a day. I have dozens of examples in both these categories.

With blogs it’s no problem if some of the sites you’re subscribed to have tons of content. If you never click through on the indicator that there are 500 unread postings, you never see them.

On Twitter though the dynamic is very different. I follow about 140 people. From time to time during the day – normally when I’m drinking a coffee like I am now, or eating food – I’ll go have a look at Twitter to see what’s up in the wider world.

Unlike with blogs, if someone posts hundreds of Twitter updates you’re going to see them all. You’re perhaps going to see something like the image above (click for larger version). That’s not what I want to see at all. I’m hoping to see a whole bunch of people posting a few things, not screen after screen of one person talking to many people I don’t know or follow. It’s worse than being in a room with someone talking loudly on a mobile phone, hearing just one side of the conversation – this is like being in a room with that same person, but they’re talking to multiple people at once.

So with some reluctance I have recently un-followed Scobelizer, guykawasaki and cameronreilly. I actually like much of their content, but they have much too much of an unbalancing effect on my overall Twitter experience.

Move along.

Anything for him but mindless good taste

Friday, March 7th, 2008

I have about ten things I could blog about today. Hopefully I wont.

I think I’m going to go out and make an impulse purchase a bit later. Can something be an impulse purchase if you blog about it first? I was in an Office Depot store today. Digital cameras are so cheap it’s ridiculous. Then throw in the value of the euro. For $129 I could pick up a 7M pixel Casio Exilim with a 3 inch screen. Why not? My old camera is a bit of a joke. Or there are nice Canon digital Elph cameras for $150. It’s hardly worth thinking about whether to buy one.

I’m getting glasses. Again. I had Lasik surgery in 2002 or so, and it’s been a wonderful 6 years. But my eyes are getting worse. I hated trying on glasses in the store today. I’ll hardly ever wear them I guess, but it’s clear (fuzzy?) to me that I’d be much better off with them.

I like women. They’re so much more interesting than men, not to mention a few other adjectives. I have a whole blog post on that one, but I’ll probably refrain.

I have two related postings: a book I’ll never write, and a Twitter  app I’ll never build. I should write them down. Twitter has so much interesting and valuable information in it. I wish their API was richer so that more things could be built. I hope they’re building some of them.

I still don’t understand why it’s considered valuable to have an API that many people build on, killing your service, if it can’t be easily monetized.

I’m booking yet another US trip. I went Silver on Delta in just 2 months this year, and am about to go Gold. This could be a Platinum year. BUT, I have to stop traveling and plant my ass on my chair in Barcelona and write more code. Have to. Must stop talking.

John Cleese’s speech at Graham Chapman’s funeral service is so moving. Can someone please do that for me?

My twitter stats

Monday, February 11th, 2008

twitter statsI seem to be done with Twitter, at least for now. The graphic shows my monthly usage graph (courtesy of tweetstats) – click for the full-sized image.

I do find Twitter valuable, but I don’t want to spend time on it. It’s a bit like TV or video games for me – I quite enjoy those things, but there are almost always better things to do.

I’ll probably subscribe to some form of Twitter alert or digest at some point. I do find it useful to know when people are coming to Barcelona. But I don’t want to monitor Twitter. Like IM, I find it too distracting and waste too much time just going to check if anything’s new.

Etc.

Hacking Twitter on JetBlue

Saturday, November 24th, 2007

I have much better and more important things to do than hack on my ideas for measuring Twitter growth.

But a man’s gotta relax sometime.

So I spent a couple of hours at JFK and then on the plane hacking some Python to pull down tweets (is this what other people call Twitter posts?), pull out their Twitter id and date, convert the dates to integers, write this down a pipe to gnuplot, and put the results onto a graph. I’ve nothing much to show right now. I need more data.

But the story with Twitter ids is apparently not that simple. While you can get tweets from very early on (like #20 that I pointed to earlier), and you can get things like #438484102 which is a recent one of mine, it’s not clear how the intermediate range is populated. Just to get a feel for it, I tried several loops like the following at the shell:

i=5000

while [ $i -lt 200000 ]
do
  wget –http-user terrycojones –http-passwd xxx \
    http://www.twitter.com/statuses/show/$i.xml
  i=`expr $i + 5000`
  sleep 1
done

Most of these were highly unsuccessful. I doubt that’s because there’s widespread deleting of tweets by users. So maybe Twitter are using ids that are not sequential.

Of course if I wasn’t doing this for the simple joy of programming I’d start by doing a decent search for the graph I’m trying to make. Failing that I’d look for someone else online with a bundle of tweets.

I’ll probably let this drop. I should let it drop. But once I get started down the road of thinking about a neat little problem, I sometimes don’t let go. Experience has taught me that it is usually better to hack on it like crazy for 2 days and get it over with. It’s a bit like reading a novel that you don’t want to put down when you know you really should.

One nice sub-problem is deciding where to sample next in the Twitter id space. You can maintain something like a heap of areas – where area is the size of the triangle defined by two tweets: their ids and dates. That probably sounds a bit obscure, but I understand it :-) Gradient of the growth curve is interesting – you probably want more samples when the gradient is changing fastest. Adding time between tweets to gradient gives you a triangle whose area you can measure. There are simpler approaches too, like uniform sampling, or some form of binary splitting of interesting regions of id space. Along the way you need to account for pages that give you a 404. That’s a data point about the id space too.

Twitter creeps in

Wednesday, November 21st, 2007

I often notice little things about how I work that I think point out value. One sign that a piece of UI is right is when you start to look for it in apps that don’t have it. For example, after I had started using mouse gestures in Opera I’d find myself wanting to make mouse gestures in other applications. When mice first started to have a wheel, I was skeptical. Support of the mouse wheel was not universal across applications. When I found myself trying to scroll with the mouse wheel in applications that didn’t support it, I knew it was right.

Tonight I came home and went to my machine. The first thing I did was to check what was going on in Twitter. That’s pretty interesting, at least for someone like me. I’ve been sending email on pretty much a daily basis for 25 years. It’s pretty much always the first thing I look at when I come back to my machine. Occasionally these days I find myself first going into Google reader to see what’s new, but that’s pretty rare and I might be looking for something specific. Tonight, I think for the first time, Twitter was where I went to – and not just for the general news, but for communications between and about people I know or am interested in. Much more interesting than looking through my email.

I’m one of those that thought Twitter was pretty silly when I first signed up (Dec 2006). I only used it once, and also found it intolerably slow. But it’s grown on me. And I find definite value there.

A few examples:

  1. I’d mailed Dick Costolo a few times in the past. Then I saw him twittering that he was drinking cortados. So I figured he must be in Spain. I mailed him, and he was. As a result I ended up at the FOWA conference in London the next day and met a bunch of people.
  2. On Tuesday I went out and bought a Wii in Manhattan to take back to my kids in Spain. I twittered about heading out to do it. I got an email a bit later from @esteve telling me to take the Wii back as they are region-locked. So I did.
  3. A week or so ago I was reading some tweets and noticed that someone had just been out to dinner in Manhattan with someone else that I wanted to meet. So I sent a mail to the first person and was soon swapping mails with the second.
  4. I’ve noticed about 5 times that interesting people were going to be in Barcelona and so I’ve mailed them out of the blue. That’s really good – people on holiday are often happy to have a beer and a chat. I’d have had no idea they were going to literally be outside my door were it not for Twitter.

Flakey Twitter and the use of consecutive ids

Friday, November 16th, 2007

Twitter was just inaccessible for maybe a couple of hours. Prior to that there was a 9-day gap in their timeline, noticed by at least a few people. I quite regularly have twitters I send not show up at all.

I wonder what could be going on over there? Things certainly don’t feel very stable.

A friend signed up tonight. Using the Twitter API you can see her id. It’s a bit over 10 million. You can also see the id of her first twitter, a bit over 417 million. The earliest twitter available on the system is number 20 “just setting up my twttr” sent at 20:50:14 on Tue Mar 21 2006 by Jack Dorsey who has user id 12 (the lowest user I’ve seen).

Given that Twitter seem to be using consecutive ids for users and twitters, and that you can pull dates out of their API, it would be pretty easy to make graphs showing growth in users and twitters over time. You could probably also infer downtime by looking for periods when no twitters appeared. This would be pretty easy too. Beyond a certain point in time it would be very accurate (i.e., when there are so many twitters arriving that a twittering gap is suspicious), and you could calculate confidence estimates.

I don’t have time for all that though.

But I wonder if Google did something like that as part of their competitive analysis when they decided to buy Jaiku, or if Twitter’s investors did it, and how the numbers would match up with whatever Twitter management might claim. I’ve no idea or opinion at all about any of that btw. But I don’t think I’d be exposing all that information by using consecutive ids for users and their twitters.

Twittering from inside emacs

Monday, November 12th, 2007

I do everything I can from inside emacs. Lately I’ve been thinking a bit about the Twitter API and social graphs.

Tonight I went and grabbed python-twitter, a Python API for Twitter. Then I wrote a quick python script to post to Twitter:

import sys
import twitter
twit = twitter.Api(username=‘terrycojones’, password=‘xxx’,
                        input_encoding=‘iso-8859-1′)
twit.PostUpdate(sys.argv[1])

and an equally small emacs lisp function to call it:

(defun tweet (mesg)
  (interactive "MTweet: ")
  (call-process "tweet" nil 0 nil mesg))

so now I can M-x tweet from inside emacs, or simply run tweet from the shell.

Along the way I wrote some simple emacs hook functions to tweet whenever I visited a new file or switched into Python mode. I’m sure that’s not so interesting to my faithful Twitter followers, but it does raise interesting questions. I also thought about adding a mail-send-hook function to Twitter every time I send a mail (and to whom). Probably not a good idea.

You can follow me in Twitter. Go on, you know you want to.

Anyway, Twitter is not the right place to publish information like this. Something more general would be nicer…

Twitterquake

Thursday, November 1st, 2007

Fastest news in the West? Twitter wins hands down.

twitterquake