Archive for the ‘companies’ Category

At what point does an Amazon EC2 reserved instance become worth it?

Friday, January 8th, 2010

If you purchase an Amazon EC2 reserved instance, you’ll pay a certain amount up front (pricing). If you don’t use the instance much, it will be more expensive per hour than a regular on-demand instance. E.g., if you paid $227.50 to reserve a small instance for a year but then only used it for a single day, you’d be paying almost $10/hr and it would obviously be much cheaper to just get an on-demand instance and pay just 8.5 cents per hour.

OTOH, if you ran a small instance for a year at the on-demand price, you’d pay $745 and it would obviously be cheaper to pay the up-front reservation price ($227.50) plus a year of the low per-hour pricing (365 * 24 * $0.03), or $490.

So for how long do you have to run an instance in order for it to be cheaper to pay for a reserved instance? (Note that I’m ignoring the time value of money, what you might do with the up-front money in the meantime if you didn’t give it to Amazon in advance, etc.)

The answer is pretty simple: for a one-year reservation you need to run the instance for about 6 months to make it worthwhile. For a three-year reservation you need to run the instance for at least 3 months per year, on average.

Here’s a fragment from a simple spreadsheet I made, based on the US N. Virginia prices:

ec2

Fault-tolerant Python Twisted classes for getting all Twitter friends or followers

Thursday, October 22nd, 2009

It’s been forever since I blogged here. I just wrote a little Python to grab all of a user’s friends or followers (or just their user ids). It uses Twisted, of course. There were two main reasons for doing this: 1) I want all friends/followers, not just the first bunch returned by the Twitter API, and 2) I wanted code that is fairly robust in the face of various 50x HTTP errors (I regularly experience INTERNAL_SERVER_ERROR, BAD_GATEWAY, and SERVICE_UNAVAILABLE).

If you want to use the code below and you’re not familiar with the Twitter API, consider whether you can use the FriendsIdFetcher and FollowersIdFetcher classes as they’ll do far fewer requests (you get 5000 results per API call, instead of 100). If you can live with user ids and do the occasional fetch of a full user, you’ll probably do far fewer API calls.

For the FriendsFetcher and FollowersFetcher classes, you get back a list of dictionaries, one per user. For FriendsIdFetcher and FollowersIdFetcher you get a list of Twitter user ids.

Of course there’s no documentation. Feel free to ask questions in the comments. Download the source.

import sys

from twisted.internet import defer
from twisted.web import client, error, http
   
if sys.hexversion >= 0x20600f0:
    import json
else:
    import simplejson as json

class _Fetcher(object):
    baseURL = ‘http://twitter.com/’
    URITemplate = None # Override in subclass.
    dataKey = None # Override in subclass.
    maxErrs = 10
    okErrs = (http.INTERNAL_SERVER_ERROR,
              http.BAD_GATEWAY,
              http.SERVICE_UNAVAILABLE)
   
    def __init__(self, name):
        assert self.baseURL.endswith(‘/’)
        self.results = []
        self.errCount = 0
        self.nextCursor = -1
        self.deferred = defer.Deferred()
        self.URL = self.baseURL + (self.URITemplate % { ‘name’ : name })

    def _fail(self, failure):
        failure.trap(error.Error)
        self.errCount += 1
        if (self.errCount < self.maxErrs and
            int(failure.value.status) in self.okErrs):
            self.fetch()
        else:
            self.deferred.errback(failure)
       
    def _parse(self, result):
        try:
            data = json.loads(result)
            self.nextCursor = data.get(‘next_cursor’)
            self.results.extend(data[self.dataKey])
        except Exception:
            self.deferred.errback()
        else:
            self.fetch()
           
    def _deDup(self):
        raise NotImplementedError(‘Override _deDup in subclasses.’)

    def fetch(self):
        if self.nextCursor:
            d = client.getPage(self.URL + ‘?cursor=%s’ % self.nextCursor)
            d.addCallback(self._parse)
            d.addErrback(self._fail)
        else:
            self.deferred.callback(self._deDup())
        return self.deferred

class _FriendsOrFollowersFetcher(_Fetcher):
    dataKey = u‘users’
   
    def _deDup(self):
        seen = set()
        result = []
        for userdict in self.results:
            uid = userdict[‘id’]
            if uid not in seen:
                result.append(userdict)
                seen.add(uid)
        return result

class _IdFetcher(_Fetcher):
    dataKey = u‘ids’
   
    def _deDup(self):
        # Keep the ids in the order we received them.
        seen = set()
        result = []
        for uid in self.results:
            if uid not in seen:
                result.append(uid)
                seen.add(uid)
        return result

class FriendsFetcher(_FriendsOrFollowersFetcher):
    URITemplate = ‘statuses/friends/%(name)s.json’

class FollowersFetcher(_FriendsOrFollowersFetcher):
    URITemplate = ‘statuses/followers/%(name)s.json’

class FriendsIdFetcher(_IdFetcher):
    URITemplate = ‘friends/ids/%(name)s.json’

class FollowersIdFetcher(_IdFetcher):
    URITemplate = ‘followers/ids/%(name)s.json’
 

Usage is dead simple:

fetcher = FriendsFetcher(‘terrycojones’)
d = fetcher.fetch()
d.addCallback(….) # etc.
 

Enjoy.

Python code for retrieving all your tweets

Wednesday, June 24th, 2009

Here’s a little Python code to pull back all a user’s Twitter tweets. Make sure you read the notes at bottom in case you want to use it.

import sys, twitter, operator
from dateutil.parser import parse

twitterURL = ‘http://twitter.com’

def fetch(user):
    data = {}
    api = twitter.Api()
    max_id = None
    total = 0
    while True:
        statuses = api.GetUserTimeline(user, count=200, max_id=max_id)
        newCount = ignCount = 0
        for s in statuses:
            if s.id in data:
                ignCount += 1
            else:
                data[s.id] = s
                newCount += 1
        total += newCount
        print >>sys.stderr, "Fetched %d/%d/%d new/old/total." % (
            newCount, ignCount, total)
        if newCount == 0:
            break
        max_id = min([s.id for s in statuses])1
    return data.values()

def htmlPrint(user, tweets):
    for t in tweets:
        t.pdate = parse(t.created_at)
    key = operator.attrgetter(‘pdate’)
    tweets = sorted(tweets, key=key)
    f = open(‘%s.html’ % user, ‘wb’)
    print >>f, """<html><title>Tweets for %s</title>
    <meta http-equiv="
Content-Type" content="text/html;charset=utf-8">
    <body><small>"
"" % user
    for i, t in enumerate(tweets):
        print >>f, ‘%d. %s <a href="%s/%s/status/%d">%s</a><br/>’ % (
            i, t.pdate.strftime(‘%Y-%m-%d %H:%M’), twitterURL,
            user, t.id, t.text.encode(‘utf8′))
    print >>f, ‘</small></body></html>’
    f.close()
   
if __name__ == ‘__main__’:
    user = ‘terrycojones’ if len(sys.argv) < 2 else sys.argv[1]
    data = fetch(user)
    htmlPrint(user, data)
 

Notes:

Fetch all of a user’s tweets and write them to a file username.html (where username is given on the command line).

Output is to a file instead of to stdout as tweet texts are unicode and sys.stdout.encoding is ascii on my machine, which prevents printing non-ASCII chars.

This code uses the Python-Twitter library. You need to get (via SVN) the very latest version, and then you need to fix a tiny bug, described here. Or wait a while and the SVN trunk will be patched.

This worked flawlessly for my 2,300 tweets, but only retrieved about half the tweets of someone who had over 7,000. I’m not sure what happened there.

There are tons of things that could be done to make the output more attractive and useful. And yes, for nitpickers, the code has a couple of slight inefficiencies :-)

FluidDB domain names available early (and free) for Twitter users

Saturday, January 24th, 2009

Sometime in the next few months, Fluidinfo will launch an alpha version of FluidDB, the database with the heart of a wiki. It’s a big engineering task, and there will still be a lot to do when we go into alpha, so we’ll initially only have a small number of applications being built on FluidDB.

But that doesn’t mean you can’t get into the action early.

Starting today, we’re pleased to offer FluidDB domains for free to Twitter users. This is perhaps the simplest way you’ll ever sign up for a new web service – if you’re a Twitter user:

Simply follow FluidDB on Twitter.

Yes, that’s it. You’re done.

Later, when we create your FluidDB domain, we’ll send you your FluidDB password via a direct message in Twitter. Note that we haven’t asked for your real name, your email, a password, sent you a cookie, or asked you to fill out a pesky sign-up form. The point here is simply to give you an early opportunity to trivially claim your preferred name.

Feel free to tweet the URL of this posting (http://bit.ly/bezc). You can follow me too for extra credit. If you’re not already a Twitter user and you want a free FluidDB domain name, sign up for Twitter, and then follow FluidDB.

Mini FAQ:

Why would I do this? By following FluidDB you will reserve your (Twitter) user name as your domain name in FluidDB.

Is there any charge? No.

What is a FluidDB domain? Sorry, but you’ll have to wait to find out the answer to this. We can tell you though that FluidDB domains will have many uses, and that they wont all be free.

What if I change my mind? Just unfollow FluidDB on Twitter.

Why Twitter? Because we like Twitter. We may do a similar thing for other services, allowing users to later claim their domain via OpenID, but that introduces the potential of naming conflicts.

Finally, please note that we can’t give an iron-clad guarantee that you’ll get your Twitter user name as your FluidDB domain name, but we’ll do our best. At this early stage of the game, we reserve the right to do whatever we want :-)

Who signed up for Twitter immediately before/after you?

Wednesday, January 14th, 2009

This is just a quick hack, done in about 20 minutes in 32 lines of Python. The following script will print out the Twitter screen names of the people who signed up immediately before and after a given user.

import sys
from twitter import Api
from operator import add
from functools import partial

inc = partial(add, 1)
dec = partial(add, -1)
api = Api()

def getUser(u):
    try:
        return api.GetUser(u)
    except Exception:
        return None

def do(name):
    user = getUser(name)
    if user:
        for f, what in (dec, ‘Before:’), (inc, ‘After:’):
            i = user.id
            while True:
                i = f(i)
                u = getUser(i)
                if u:
                    print what, u.screen_name
                    break
    else:
        print ‘Could not find user %r’ % name

if __name__ == ‘__main__’:
    for name in sys.argv[1:]:
        do(name)
 

I’m happy to have reached the point in my Python development where I can pretty much just type something like this in without really having to think, including the use of operator.add and functools.partial.

BTW, the users who signed up immediately before and after I did were skywalker and kitu012.

The above is just a hack. Notes:

  1. If it can’t retrieve a user for any reason, it just assumes there is no such user.
  2. Twitter periodically deletes accounts of abusers, so the answer will skip those.
  3. Twitter had lots of early hiccups, so there may be no guarantee that user ids were actually assigned sequentially.
  4. This script may run forever.
  5. I’m using the Python Twitter library written by DeWitt Clinton. It’s been a while since it was updated, and it doesn’t give you back the time a user was created in Twitter. It would be fun to print that too.

As you were.

10,000 things: Andrew Hensel lives (on Twitter)

Monday, January 5th, 2009

Andrew Hensel was an extraordinary human being.

We were graduate students together at The University of Waterloo in Canada in 1986-88. I met him on my first day there and we spent many hours together on a daily basis over the next 2.5 years. I don’t want to try to say too much about him now. It occurred to me a few days ago that I might post a few stories here. We did lots of crazy things. At one point I had wanted to write something titled “100 things to a Hensel” and I made a bunch of notes, but it went no further.

I wrote about him in my Ph.D. acknowledgments in 1995:

Andrew Hensel, with whom I shared so much of my two and a half years at Waterloo, was the most original and creative person I have ever known well. Together, we dismantled the world and rebuilt it on our own crazy terms. We lived life at a million miles an hour and there was nothing like it. Five years ago, Andrew killed himself. There have been few days since then that I have not thought of him and the time we spent together.

I still think about him frequently. Today I was remembering one of his many, many oddball projects (most of which went unfinished), which he called “10,000 things”. It was to be a list of 10,000 things that he thought of. By the time he started sending them to me we had both dropped out of Waterloo. He was back in Australia and I was in Munich.

He only sent me 300 of the to-be 10,000. Of course I still have them. They’re all very short. At the risk of being thought macabre I’ve decided to bring Andrew back a very little and post them to Twitter, chosen at random, one a day. You can follow adhensel to get just a glimpse of his mind. The first tweet, people being planted into earth, is already up.

There are at least half a dozen twitterers who knew Andrew, including one who knew him probably better than anybody. Once in a while I get email from someone who finds my online mentions of him. Invariably they also found him extraordinary.

What would Andrew have made of Twitter? I have no doubt at all that he’d have immediately dismissed it as “weak”. That was one of his favorite adjectives. Almost everything was weak. It’s a small miracle to me to partly bring him back to life 18 years after he died, by posting just some of his 10,000 things to Twitter.

And… my apologies to anyone who knew Andrew and who finds this upsetting.

Not alone

Friday, December 5th, 2008

Robert Scoble has just written a really nice article about Fluidinfo, calling us both “world-changing” and “unfundable”. Funnily, Tim O’Reilly said something similar when I talked to him at OATV. He said something like: “This could take over the world” and in the very same sentence “but I don’t see how we could fund you.” The two things an entrepreneur most and least wants to hear, all in one sentence. I’ll never forget it.

A few people have mailed me to say that the Scoble videos create the incorrect impression that I’m building FluidDB alone. So I wanted to clear that up. Others who are actively involved in Fluidinfo are:


Esteve Fernandez is doing the most difficult coding. Esteve and I are the only two employees of the company. We even have modest salaries. We spend most of our time apart, writing code, swapping email. Once or twice a week we meet in person to talk about architecture, current problems, or for him to gently explain to me how I could have written my code more elegantly and usefully. I usually try to stay out of his way, as he’s a force of nature and I just slow him down. He left a solid and secure job that he liked in Barcelona and then said no to Google to join Fluidinfo.

Esther Dyson invested in Fluidinfo just over a year ago. Esther is an incredible investor to have involved for a company like Fluidinfo. I wont try to summarize, except to say that without her support we probably wouldn’t be here today. After a year of trying to find investors, I’m more keenly aware than ever of how extraordinary Esther is.

Russell Manley is the other company director. It was Russell who pointed Delicious out to me a few years ago and got me back onto working on this project after I’d put it aside for 6 years. Russell is a finance guy with a ton of experience in operations and running companies. He’s an investment director at Land Securities in London, and sits on over 30 boards. He’s also a close friend, incredibly smart, and widely read. I hope one day we’ll be able to get him into Fluidinfo, though that will take some doing.

Nicholas Radcliffe is an old friend and advisor. He’s the founder and CEO of Stochastic Solutions. He was also a founder, CTO, and then CEO of Quadstone, raising tens of millions of pounds along the way. Quadstone was acquired a couple of years ago. He’s into algorithmic approaches to targeted direct marketing, and he’s very successful. He has a Ph.D. in physics, so you don’t want to mess with Nick. He’s also an advisor to Scottish Equity Partners. Nick is my harshest and most unrelenting critic.

That’s it for now. There are probably a dozen others who are peripherally involved, but not on a day-to-day basis. I’m very happy to have just two people on payroll right now. We’re pretty much recession proof. I went through the 2000-2004 as CTO of Eatoni in New York, and we survived by cutting every possible cost and keeping our headcount as low as possible. So operating on a shoestring comes pretty naturally. I feel we’re strong and small like a hard nut, and not really exposed to the economic downturn. It’s a great time to be tiny and to be focussed on building a product.

It would of course be nice to be properly funded. But I’ve always been confident that’s just a matter of time. The main thing, perhaps the only thing, is to get an alpha version of FluidDB released so people can start building things on it.

Twittendipity: a chance interview with Robert Scoble

Thursday, December 4th, 2008

On Monday Tim O’Reilly posted a Twitter tweet suggesting to Robert Scoble that he contact me while in Barcelona.

First off, Tim is very generous in doing this. He’s ultra connected and he spends a significant amount of his time in Twitter pointing things out, connecting people, and re-tweeting stuff he finds interesting. Re-tweeting is really important because when you tweet you only reach the people who are already following you. But when someone re-tweets you, you reach new people who likely have no idea of your existence. And when Tim does the re-tweeting there can be a big impact. 24 hours after his message to Robert I had 50 new followers. Tim explicitly tries to help people doing things he finds interesting, but who have just a small number of Twitter followers. He filters and amplifies information, broadcasting it out to his 16,000+ followers. Robert was in a hotel about 10 minutes’ walk from my place and I had no idea. A mutual friend in California noticed and took a minute to connect us. That’s really something, and it perfectly illustrates some of the value of Twitter.

I met Robert yesterday afternoon and we spent 6 hours together. It was great. You can see at once why he’s been so successful: he’s smart, he’s thoughtful, he’s sympathetic, and he’s a careful listener. I had no idea what to expect, and seeing as what we’re building can take some time to sink in, I wondered what sort of an audience he’d be.

After we’d climbed around up in the Sagrada Familia (official site, wikipedia), Robert came back to my place to see a demo of the things I’d been describing. We sat down and he pulled out his cell phone and asked if he could film me. I didn’t really think about it and said of course. It didn’t dawn on me that we were doing an informal interview, and I was totally unprepared – which is probably a good thing.

In the end we filmed 4 segments: parts one, two, three, and four. There’s also been some discussion here on Robert’s FriendFeed page.

So if you’ve been wondering what we’re building in here, go watch the videos.

I had no idea all this was about to come down. The Fluidinfo web site (a generous word) was a single page with no contact information, no nuthin’. We simply haven’t needed a web site of any description yet. I went and added a box so you can sign up to receive news of the alpha launch.

And then there was this, posted on Twitter, and which I have absolutely no shame in reproducing (this is a blog, after all):

Wow, what @terrycojones showed me last night (a new kind of database that he’s been workng on for 11 years) blew me away. Uploading vids now

Now I have to put my head back down with Esteve to get the alpha out the door ASAP.

Amazon SimpleDB a complete flop?

Tuesday, December 2nd, 2008

Today Amazon slashed the price on storage in SimpleDB from $1.50 per Gb per month to just $0.25 per Gb per month.

Note that you can buy a 1TB hard drive these days for $75. That’s 7.5 cents per Gb for as long as the drive lasts. So Amazon were charging 200 times the price of retail hard disk storage per month. Yes, the AWS storage is replicated, and you don’t need a data center or employees, but a 200X markup (per month) seemed a bit excessive. Until last night, that $1.50 figure was the first price in the pricing section of the SimpleDB page – not a smart move (sticker shock). The storage price is now the last thing in the pricing section.

I spend a bunch of time talking to folks working at other startups. I hear about EC2 and S3 usage all the time, but I’ve never heard of anyone using SimpleDB. I hadn’t really thought about it too much. I had noticed that the price for storage in SimpleDB is (was) 10 times higher than for storage in S3, and thought that created an opportunity for Fluidinfo. But that huge difference is now gone – in fact SimpleDB is now free for everyone for the first 6 months following the public beta.

I found myself asking “What’s going on?” It’s not like Amazon to suddenly offer their services for free. The free offer coming with the service entering beta seemed pretty thin. If anything it should get more expensive, or stay the same, not suddenly transition to free.

Then I began to explicitly wonder just how many people are actually using SimpleDB. So I just ran some sample Google queries to get an idea. The results are amazing:

Query # Hits
“using amazon simpleDB” 68
“using simpleDB” 1010
“simpleDB sucks” 3
“love simpleDB” 1
“hate simpleDB” 0
“recommend simpleDB” 0
“we are using simpleDB” 0
“we are using amazon simpleDB” 0
“we use amazon simpleDB” 1
“we use simpleDB” 4

Note that all queries are entered into Google in quotes.

Given just these results, and knowledge that SimpleDB was launched a year ago, I think you’d have to conclude that SimpleDB is a complete flop. Either that or Google is playing evil tricks due to their own appEngine offering. That would seem unlikely. Plus, the numbers for the obviously popular S3 and EC2 are much much higher: If you try these queries with S3 or EC2 instead of SimpleDB, you’ll see 5K, 10K, 15K results.

I find the above numbers astounding. I’m deadly curious to know what’s going on here. Was SimpleDB just too expensive to consider using? Is its model too awkward? If it sucked, people would say so. But there’s virtually nothing out there. It’s as though developers took one look and completely ignored it. That would be my guess (in fact it’s what I did, so I’m probably biased in my explanation of what others may have done).

At least we can say that more people love SimpleDB than hate it :-)

It’s not my intention to bash Amazon or AWS. I love and use S3 and EC2 every single day. They’ve changed the world, and this is only the beginning. But I have no use at all for SimpleDB. I’d always assumed it was a big success too, but it looks like that may be wrong.

Comments very welcome. Do you know anyone using SimpleDB?

Changing POV under Twitter

Wednesday, November 26th, 2008

One thing I’d like to be able to do in Twitter is change my point of view. That is, see what Twitter looks like from the POV of another user.

Given Twitter’s asymmetric follower model and the prevalence of @ messaging, it’s very common to run across a fragment of a conversation that seems potentially interesting. It’s also common not to be following the full set of people who are interacting.

For example, four people might be exchanging tweets on a subject, and you may follow just one of them. So you’ll see roughly one quarter of the thread. Right now, to get the context for the discussion you need to go take a look at the archives of the various people and try to piece the conversation together. You have to do this one tweeter at a time. Or you could temporarily follow the people involved and then page backwards through time to see the flow of tweets. With some work on the server side, Twitter could let you see this using the Twitter search interface (you’d need to put in the names of the various parties though).

It would be much simpler and much cooler to just to click a link besides a user’s name and get that user’s POV. You’d see what they see, except for the people whose tweets are private and which you’re prevented from seeing by the Twitter permission system. Not only could you see more or all of a conversation, I bet it would be really interesting to see Twitter from someone else’s POV. You could click on the @Replies tab to see all replies to that user, etc. There’s no reason why not – it’s all public data, and you can easily fetch the @replies using the search interface. I think wandering around inside the Twitterverse jumping from the POV of one identity to another would be fascinating. It reminds me of wandering around inside the wayback machine, except it’s the present.

That would all be pretty easy to implement, even for a 3rd party using the Twitter API. It would be nice if Twitter were to implement it themselves. I could do the basics myself in a few hours, but I’d rather not. This is also something that could be accessed via a Firefox extension or Greasemonkey – install it and get an extra button next to every tweet. The button switches you to the POV of the tweeter.

All we need is someone to build it.

I have several more Twitter blog posts I’d love to write. The most interesting, to me, is all about evolutionary biology, sex, and the meaning of life itself. But no time, no time. I’ve finally added a Twitter category to this blog, and was surprised to find 14 posts that fit it. Am I obsessed?

As usual, make sure you follow me :-)

Passion and the creation of highly non-uniform value

Monday, November 10th, 2008

Here, finally, are some thoughts on the creation of value. I don’t plan to do as good a job as the subject merits, but if I don’t take a rough stab at it, it’ll never happen.

I’ll first explain what I mean by “the creation of highly non-uniform value”. I’m talking about ideas that create a lot of (monetary) value for a very small number of people. If you made a graph and on the X axis put all the people in the world, in sorted order of how much they make from an idea, and on the Y axis you put value they each receive, we’re talking about distributions that look like the image above, but much more skewed.

In other words, a setting in which a very small number of people try to get extremely rich. I.e., startup founders, a few key employees, their investors, and their investors’ investors. BTW, I don’t want to talk about the moral side of this, if there is one. There’s nothing to stop the obscenely rich from giving their money away or doing other charitable things with it.

So let’s just accept that many startup founders, and (in theory) all venture investors, are interested in turning ideas into wealth distributions that look like the above.

I was partly beaten to the punch on this post by Paul Graham in his essay Why There Aren’t More Googles? Paul focused on VC caution, and with justification. But there’s another important part of the answer.

One of the most fascinating things I’ve heard in the last couple of years is an anecdote about the early Google. I wrote about it in an earlier article, The blind leading the blind:

…the Google guys were apparently running around search engine companies trying to sell their idea (vision? early startup?) for $1M. They couldn’t find a buyer. What an extraordinary lack of.. what? On the one hand you want to laugh at those idiot companies (and VCs) who couldn’t see the huge value. OK, maybe. But the more extraordinary thing is that Larry Page and Sergei Brin couldn’t see it either! That’s pretty amazing when you think about it. Even the entrepreneurs couldn’t see the enormous value. They somehow decided that $1M would be an acceptable deal. Talk about a lack of vision and belief.

So you can’t really blame the poor VCs or others who fail to invest. If the founding tech people can’t see the value and don’t believe, who else is going to?

I went on to talk about what seemed like it might be a necessary connection between risk and value.


True valueFollowing on…

After more thought, I’m now fairly convinced that I was on the right track in that post.

It seems to me that the degree to which a highly non-uniform wealth distribution can be created from an idea depends heavily on how non-obvious the value of the idea is.

If an idea is obviously valuable, I don’t think it can create highly non-uniform wealth. That’s not to say that it can’t create vast wealth, just that the distribution of that wealth will be more widely spread. Why is that the case? I think it’s true simply because the value will be apparent to many people, there will be multiple implementations, and the value created will be spread more widely. If the value of an idea is clear, others will be building it even as you do. You might all be very successful, but the distribution of created value will be more uniform.

Obviously it probably helps if an idea is hard to implement too, or if you have some other barrier to entry (e.g., patents) or create a barrier to adoption (e.g., users getting positive reinforcement from using the same implementation).

I don’t mean to say that an idea must be uniquely brilliant, or even new, to generate this kind of wealth distribution. But it needs to be the kind of proposition that many people look at and think “that’ll never work.” Even better if potential competitors continue to say that 6 months after launch and there’s only gradual adoption. Who can say when something is going to take off wildly? No-one. There are highly successful non-new ideas, like the iPod or YouTube. Their timing and implementation were somehow right. They created massive wealth (highly non-uniformly distributed in the case of YouTube), and yet many people wrote them off early on. It certainly wasn’t plain sailing for the YouTube founders – early adoption was extremely slow. Might Twitter, a pet favorite (go on, follow me), create massive value? Might Mahalo? Many people would have found that idea ludicrous 1-2 years ago – but that’s precisely the point. Google is certainly a good example – search was supposedly “done” in 1998 or so. We had Alta Vista, and it seemed great. Who would’ve put money into two guys building a search engine? Very few people.

If it had been obvious the Google guys were doing something immensely valuable, things would have been very different. But they traveled around to various companies (I don’t have this first hand, so I’m imagining), showing a demo of the product that would eventually create $100-150B in value. It wasn’t clear to anyone that there was anything like that value there. Apparently no-one thought it would be worth significantly more that $1M.

I’ve come to the rough conclusion that that sort of near-universal rejection might be necessary to create that sort of highly non-uniform wealth distribution.

There are important related lessons to be learned along these lines from books like The Structure of Scientific Revolutions and The Innovator’s Dilemma.

Now back to Paul’s question: Why aren’t there more Googles?

Part of the answer has to be that value is non-obvious. Given the above, I’d be willing to argue (over beer, anyway) that that’s almost by definition.

So if value is non-obvious, even to the founders, how on earth do things like this get created?

The answer is passion. If you don’t have entrepreneurs who are building things just from sheer driving passion, then hard projects that require serious energy, sacrifice, and risk-taking, simply wont be built.

As a corollary, big companies are unlikely to build these things – because management is constantly trying to assess value. That’s one reason to rue the demise of industrial research, and a reason to hope that cultures that encourage people to work on whatever they want (e.g., Google, Microsoft research) might be able to one day stumble across this kind of value.

This gets me to a recent posting by Tim Bray, which encourages people to work on things they care about.

It’s not enough just to have entrepreneurs who are trying to create value. As I’m trying to say, practically no-one can consistently and accurately predict where undiscovered value lies (some would argue that Marc Andreessen is an exception). If it were generally possible to do so, the world would be a very different place – the whole startup scene and venture/angel funding system would be different, supposing they even existed. Even if it looks like a VC or entrepreneur can infallibly put their finger on undiscovered value, they probably can’t. One-time successful VCs and entrepreneurs go on to attract disproportionately many great companies, employees, funding, etc., the next time round. You can’t properly separate their raw ability to see undiscovered value from the strong bias towards excellence in the opportunities they are later afforded. Successful entrepreneurs are often refreshingly and encouragingly frank about the role of luck in their success. They’re done. VCs are much less sanguine – they’re supposed to have natural talent, they’re trying to manufacture the impression that they know what they’re doing. They have to do that in order to get their limited partners to invest in their funds. For all their vaunted insight, roughly only 25% of VCs provide returns that are better than the market. The percentage generating huge returns will of course be much smaller, as in turn will be those doing so consistently. I reckon the whole thing’s a giant crap shoot. We may as well all admit it.

I have lots of other comments I could make about VCs, but I’ll restrict myself to just one as it connects back to Paul’s article.

VCs who claim to be interested in investing in the next Google cannot possibly have the next Google in their portfolio unless they have a company whose fundamental idea looks like it’s unlikely to pan out. That doesn’t mean VCs should invest in bad ideas. It means that unless VCs make bets on ideas that look really good – but which are e.g., clearly going to be hard to build, will need huge adoption to work, appear to be very risky long-shots, etc. – then they can’t be sitting on the next Google. It also doesn’t mean VCs must place big bets on stuff that’s highly risky. A few hundred thousand can go a long way in a frugal startup.

I think this is a fundamental tradeoff. You’ll very frequently hear VCs talk about how they’re looking for companies that are going to create massive value (non-uniformly distributed, naturally), with massive markets, etc. I think that’s pie in the sky posturing unless they’ve already invested in, or are willing to invest in, things that look very risky. That should be understood. And so a question to VCs from entrepreneurs and limited partners alike: if you claim to be aiming to make massive returns, where are your necessary correspondingly massively risky investments? Chances are you wont find any.

There is a movement in the startup investment world towards smaller funds that make smaller investments earlier. I believe this movement is unrelated to my claim about non-obviousness and highly non-uniform returns. The trend is fuelled by the realization that lots of web companies are getting going without the need for traditional levels of financing. If you don’t get in early with them, you’re not going to get in at all. A big fund can’t make (many) small investments, because their partners can’t monitor more than a handful of companies. So funds that want to play in this area are necessarily smaller. I think that makes a lot of sense. A perhaps unanticipated side effect of this is that things that look like they may be of less value end up getting small amounts of funding. But on the whole I don’t think there’s a conscious effort in that direction – investors are strongly driven to select the least risky investment opportunities from the huge number of deals they see. After all, their jobs are on the line. You can’t expect them to take big risks. But by the same token you should probably ignore any talk of “looking for the next Google”. They talk that way, but they don’t invest that way.

Finally, if you’re working on something that’s being widely rejected or whose value is being widely questioned, don’t lose heart (instead go read my earlier posting) and don’t waste your time talking to VCs. Unless they’re exceptional and serious about creating massive non-uniformly distributed value, and they understand what that involves, they certainly wont bite.

Instead, follow your passion. Build your dream and get it out there. Let the value take care of itself, supposing it’s even there. If you can’t predict value, you may as well do something you really enjoy.

Now I’m working hard to follow my own advice.

I had to learn all this the hard way. I spent much of 2008 on the road trying to get people to invest in Fluidinfo, without success. If you’re interested to know a little more, earlier tonight I wrote a Brief history of an idea to give context for this posting.

That’s it for now. Blogging is a luxury I can’t afford right now, not that I would presume to try to predict which way value lies.

Expecting and embracing startup rejection

Sunday, November 9th, 2008

When I was younger, I didn’t know what to make of it when people rejected my ideas. Instead of fighting it, trying again, or improving my delivery, I’d just conclude that the rejector was an idiot, and that it was their loss if they didn’t get it.

For example, I put considerable time and effort into writing academic papers, several of which were rejected, to my surprise. I’d never considered that the papers might not be accepted. When this happened, I wouldn’t re-submit them or try to re-publish them. By then I would usually have moved on to doing something else anyway.

When I applied for jobs, it never entered my mind that I might not be wanted. How could anyone not want me? After a couple of years working on my current ideas, I applied for a computer science faculty position at over 40 US universities. I refused to emphasize my well-received and published Ph.D. work, of which I was and am still proud, because I was no longer working in that area.

I was convinced the new ideas would be recognized as being strong.

But guess what? I was summarily rejected by all 40+ universities. I only got one interview, at RPI. No other school even wanted to meet me. I kept all the rejection letters. I still have them. (Amusingly, I was swapping emails with Ben Bederson earlier this year and it transpired that he’d had the same experience, also with 40 universities, and he too kept all his rejection letters!)

You never learn more than when you’re being humbled.

I’ve now returned to those same ideas and have been working on them for the last 3 years. In January 2007 I went and met with a couple of the most appropriately visionary VCs to tell them what I was building. I was naïve enough to think they might back me at that early point. Wrong. They suggested I come back with a demo to concretely illustrate what the system would allow people to do. That was easier said than done – the system is not simple. I spent 2007 building the core engine, a 90% fully-functional demo of the major application, several smaller demo apps (including a Firefox toolbar extension built by Esteve Fernandez), and added about 20 sample data sets to further illustrate possibilities.

That’ll show ‘em, right? I went out in November 2007 armed with all this, and began talking to a variety of potential investors. I was sure VCs would be falling over themselves to invest, especially given that we were working on some mix of innovative search, cloud computation, APIs, and various Web 2.0 concepts, and that tons of VCs claimed to be looking for the Next Big Thing in search, and for Passionate Entrepreneurs tackling Hard Problems who wanted to build Billion Dollar Companies, etc., etc.

You guessed it. Over the next year literally dozens of potential investors all said no. The demo wasn’t enough. Would people use it? Could we build the real thing? Would it scale? Where was the team? What are you doing in Barcelona? “Looks fascinating, do please let us know when you’ve released it and are seeing adoption,” they almost universally told me. The standout exception to this was Esther Dyson, who agreed to invest immediately after seeing the demo, and whose courage I hope I can one day richly reward.

What to make of all this rejection?

One thing that became clear is that if you’re smarter than average, you’ll almost by definition be constantly thinking of things too early. Maybe many years too early. Your ideas will seem strange, doubtful, and perhaps plain wrong to many people.

This makes you realize how important timing is.

Being right with an idea too early and trying to build a startup around it is similar to correctly knowing a company is going to fail, and immediately rushing out to short its stock. Even though you’re right, you can be completely wiped out if the stock’s value rises in the short term. You were brilliant, insightful, and 100% correct – but you were too early.

Getting timing right can clearly be partly based on calculation and reason. But given that many startups are driven by founder passion, I think luck in timing plays an extremely important role in startup success. And the smarter and more far-sighted you are, the greater the chance that your timing will be wrong.

So the that’s the first thing to understand: if you’re smarter than average, your ideas will, on average, be ahead of their time. Some level of rejection comes with the territory.

But I’d go much further than that, and claim that if you are not seeing a very high level of rejection in trying to get a new idea off the ground, you’re probably not working on anything that’s going to change the world or be extremely valuable.

That might sound like an outrageous extrapolation (or even wishful thinking, given my history). Later tonight I plan to explain this claim in a post on the connections between passion, value, non-obviousness, and rejection. That’s the subject I really want to write about.

For now though, I simply want to say that I’ve come to understand that having one’s ideas regularly rejected is a good sign. It tells you you’re either on a fool’s errand, or that you’re doing something that might actually be valuable and important.

If you’re not going to let rejection get you down, you might content yourself by learning to ignore it. But you can do better. You can come to regard it as positive and affirming. Without becoming pessimistic or in any way accepting defeat, you can come to expect to be rejected and even to embrace it.

If you can do that, rejection loses its potential for damage. As Paul Graham pointed out, the impact of disappointment can destroy a startup. That’s an important observation, and a part of why startups can be so volatile and such a wild ride.

I don’t mean to suggest that you don’t also do practical things with rejection too – like learn from it. That’s very important and will help you shape your product, thoughts, presentation, expectations, etc. Again, see Paul’s posting.

But I think the mental side of rejection is more important than the practical. The mental side has more destructive potential. You have to figure out how to deal with it. If you look at it the right way you can turn it into something that’s almost by definition positive, as I’ve tried to illustrate.

In a sense I even relish it, and use it for fuel. There are little tricks I sometimes use to keep myself motivated. I even keep a list of them (and no, you can’t see it). One is imagining that some day all the people who rejected me along the way will wring their hands in despair at having missed such an opportunity :-)

I’ve not been universally rejected, of course. There are lots of people who know what we’re doing and are highly supportive (more on them at a later point). If I’d been universally rejected, or rejected by many well-known people whose opinions I value, I probably would have stopped by now.

I’ve had to learn to see a high level of rejection as not just normal but a necessary (but not sufficient!) component of a correct belief that you’re doing something valuable.

Stay tuned for the real point of this flurry of blogging activity.

Twitter’s amazing stickiness (with a caveat)

Friday, October 31st, 2008

I just followed a link to a site that shows the date of the first tweet of 50 early Twitter users. I wondered how many of these early users were still active users, and guessed many would be.

Instead of going and fetching each user’s last tweet by hand, I wrote a little shell script to do all the work:

for name in \
  `curl -s http://myfirsttweet.com/oldest.php |
   perl -p -e ‘s,<a href="http://myfirsttweet.com/1st/(\w+)">,\nNAME:\t$1\n,g’ |
   egrep ‘^NAME:’ |
   cut -f2 |
   uniq`
do
    echo $name \
      `curl -s "http://twitter.com/statuses/user_timeline/$name.xml?count=1" |
       grep created_at |
       cut -f2 -d\> |
       cut -f1 -d\<`
done
 

Who wouldn’t want to be a (UNIX) programmer!?

And the output, massaged into an HTML table:

User Last tweeted on
jack Thu Oct 30 03:41:49 +0000 2008
biz Thu Oct 30 22:24:12 +0000 2008
Noah Tue Oct 28 22:56:15 +0000 2008
adam Thu Oct 30 21:34:56 +0000 2008
tonystubblebine Fri Oct 31 00:53:38 +0000 2008
dom Thu Oct 30 20:36:31 +0000 2008
rabble Fri Oct 31 00:56:28 +0000 2008
kellan Fri Oct 31 00:32:44 +0000 2008
sarahm Thu Oct 30 22:45:37 +0000 2008
dunstan Thu Oct 30 23:59:57 +0000 2008
stevej Fri Oct 31 00:12:03 +0000 2008
lemonodor Thu Oct 30 18:21:43 +0000 2008
blaine Wed Oct 29 23:52:06 +0000 2008
rael Fri Oct 31 01:02:58 +0000 2008
bob Fri Oct 31 00:39:18 +0000 2008
graysky Fri Oct 31 00:23:21 +0000 2008
veen Thu Oct 30 19:47:40 +0000 2008
dens Fri Oct 31 00:13:12 +0000 2008
heyitsnoah Thu Oct 30 20:09:35 +0000 2008
rodbegbie Thu Oct 30 23:42:39 +0000 2008
astroboy Thu Oct 30 22:07:50 +0000 2008
alba Thu Oct 30 16:06:29 +0000 2008
kareem Thu Oct 30 20:20:14 +0000 2008
gavin Thu Oct 30 17:48:45 +0000 2008
nick Fri Oct 31 01:17:29 +0000 2008
psi Thu Oct 30 20:40:53 +0000 2008
vertex Fri Oct 31 00:44:09 +0000 2008
mulegirl Fri Oct 31 00:31:05 +0000 2008
thedaniel Thu Oct 30 20:00:31 +0000 2008
myles Thu Oct 30 15:50:31 +0000 2008
mike ftw Fri Oct 31 00:28:00 +0000 2008
stumblepeach Thu Oct 30 23:20:06 +0000 2008
bunch Sat Oct 25 20:46:42 +0000 2008
adamgiles com Thu Apr 10 17:22:52 +0000 2008
naveen Thu Oct 30 23:24:23 +0000 2008
nph Fri Oct 31 01:53:13 +0000 2008
caterina Tue Oct 28 18:07:32 +0000 2008
rafer Thu Oct 30 19:23:50 +0000 2008
ML Thu Oct 30 15:31:47 +0000 2008
brianoberkirch Thu Oct 30 20:21:43 +0000 2008
joelaz Thu Oct 30 22:03:59 +0000 2008
arainert Fri Oct 31 01:18:43 +0000 2008
tony Sun Oct 26 18:16:02 +0000 2008
brianr Fri Oct 31 01:57:27 +0000 2008
prash Tue Oct 28 22:14:24 +0000 2008
danielmorrison Thu Oct 30 21:37:41 +0000 2008
slack Fri Oct 31 01:26:08 +0000 2008
mike9r Thu Oct 30 21:17:29 +0000 2008
monstro Thu Oct 30 22:28:46 +0000 2008
mat Fri Oct 31 00:26:22 +0000 2008

Wow… look at those dates. Only one of these people has failed to update in the last week!

Here’s the caveat. We don’t know how many early Twitter users are in the My First Tweet database. The data looks suspicious: there are only 50 Twitter users in a 7 month period? That can’t be right. So it’s possible the My First Tweet database is built by finding currently active tweeters and then looking back to their first post. If so, my table doesn’t say much about stickiness.

But I find it fairly impressive in any case.

Digging into Twitter following

Monday, October 13th, 2008

TwitterThis is just a quick post. I have a ton of things I could say about this, but they’ll have to wait – I need to do some real work.

Last night and today I wrote some Python code to dig into the follower and following sets of Twitter users.

I also think I understand better why Twitter is so compelling, but that’s going to have to wait for now too.

You give my program some Twitter user names and it builds you a table showing numbers of followers, following etc. for each user. It distinguishes between people you follow and who don’t follow you, and people who follow you but whom you don’t follow back.

But the really interesting thing is to look at the intersection of some of these sets between users.

For example, if I follow X and they don’t follow me back, we can assume I have some interest in X. So if am later followed by Y and it turns out that X follows Y, I might be interested to know that. I might want to follow Y back just because I know it might bring me to the attention of X, who may then follow me. If I follow Y, I might want to publicly @ message him/her, hoping that he/she might @ message me back, and that X may see it and follow me.

Stuff like that. If you think that sort of thing isn’t important, or is too detailed or introspective, I’ll warrant you don’t know much about primate social studies. But more on that in another posting too.

As another example use, I plan to forward the mails Twitter sends me telling me someone new is following me into a variant of my program. It can examine the sets of interest and weight them. That can give me an automated recommendation of whether I should follow that person back – or just do the following for me.

There are lots of directions you could push this in, like considering who the person had @ talked to (and whether those people were followers or not) and the content of their Tweets (e.g., do they talk about things I’m interested or not interested in?).

Lots.

For now, here are links to a few sample runs. Apologies to the Twitter users I’ve picked on – you guys were on my screen or on my mind (following FOWA).

I’d love to turn these into nice Euler Diagrams but I didn’t find any decent open source package to produce them.

I’m also hoping someone else (or other people) will pick this up and run with it. I’ve got no time for it! I’m happy to send the source code to anyone who wants it. Just follow me on Twitter and ask for it.

Example 1: littleidea compared to sarawinge.
Example 2: swardley compared to voidspace.
Example 3: aweissman compared to johnborthwick.

And finally here’s the result for deWitt, on whose Twitter Python library I based my own code. This is the output you get from the program when you only give it one user to examine.

More soon, I guess.

How many users does Twitter have?

Monday, October 13th, 2008

Inclusion/Exclusion

Here’s a short summary of a failed experiment using the Principle of Inclusion/Exclusion to estimate how many users Twitter has. I.e., there’s no answer below, just the outline of some quick coding.

I was wondering about this over cereal this morning. I know some folks at Twitter, and I know some folks who have access to the full tweet database, so I could perhaps get that answer just by asking. But that wouldn’t be any fun, and I probably couldn’t blog about it.

I was at FOWA last week and it seemed that absolutely everyone was on Twitter. Plus, they were active users, not people who’d created an account and didn’t use it. If Twitter’s usage pattern looks anything like a Power Law as we might expect, there will be many, many inactive or dormant accounts for every one that’s moderately active.

BTW, I’m terrycojones on Twitter. Follow me please, I’m trying to catch Jason Calacanis.

You could have a crack at answering the question by looking at Twitter user id numbers via the API and trying to estimate how many users there are. I did play with that at one point at least with tweet ids, but although they increase there are large holes in the tweet id space. And approaches like that have to go through the Twitter API, which limits you to a mere 70 requests per hour – not enough for any serious (and quick) probing.

In any case, I was looking at the Twitter Find People page. Go to the Search tab and you can search for users.

I searched for the single letter A, and got around 109K hits. That lead me to think that I could get a bound on Twitter’s size using the Principle of Inclusion/Exclusion (PIE). (If you don’t know what that is, don’t be intimidated by the math – it’s actually very simple, just consider the cases of counting the size of the union of 2 and 3 sets). The PIE is a beautiful and extremely useful tool in combinatorics and probability theory (some nice examples can be found in Chapter 3 of the introductory text Applied Combinatorics With Problem Solving). The image above comes from the Wikipedia page.

To get an idea of how many Twitter users there are, we can add the number of people with an A in their name to the number with a B in their name, …., to the number with a Z in their name.

That will give us an over-estimate though, as names typically have many letters in them. So we’ll be counting users multiple times in this simplistic sum. That’s where the PIE comes in. The basic idea is that you add the size of a bunch of sets, and then you subtract off the sizes of all the pairwise intersections. Then you add on the sizes of all the triple set intersections, and so on. If you keep going, you get the answer exactly. If you stop along the way you’ll have an upper or lower bound.

So I figured I could add the size of all the single-letter searches and then adjust that downwards using some simple estimates of letter co-occurrence.

That would definitely work.

But then the theory ran full into the reality of Twitter.

To begin with, Twitter gives zero results if you search for S or T. I have no idea why. It gives a result for all other (English) letters. My only theory was that Twitter had anticipated my effort and the missing S and T results were their way of saying Stop That!

Anyway, I put the values for the 24 letters that do work into a Python program and summed them:

count = dict(a = 108938,
             b =  12636,
             c =  13165,
             d =  21516,
             e =  14070,
             f =   5294,
             g =   8425,
             h =   7108,
             i = 160592,
             j =   9226,
             k =  12524,
             l =   8112,
             m =  51721,
             n =  11019,
             o =   9840,
             p =   8139,
             q =   1938,
             r =  10993,
             s =      0,
             t =      0,
             u =   8997,
             v =   4342,
             w =   6834,
             x =   8829,
             y =   8428,
             z =   3245)

upperBoundOnUsers = sum(count.values())
print ‘Upper bound on number of users:’, upperBoundOnUsers

The total was 515,931.

Remember that that’s a big over-estimate due to duplicate counting.

And unless I really do live in a tech bubble, I think that number is way too small – even without adjusting it using the PIE.

(If we were going to adjust it, we could try to estimate how often pairs of letters co-occur in Twitter user names. That would be difficult as user names are not like normal words. But we could try.)

Looking at the letter frequencies, I found them really strange. I wrote a tiny bit more code, using the English letter frequencies as given on Wikipedia to estimate how many hits I’d have gotten back on a normal set of words. If we assume Twitter user names have an average length of 7, we can print the expected numbers versus the actual numbers like this:

# From http://en.wikipedia.org/wiki/Letter_frequencies
freq = dict(a = 0.08167,
            b = 0.01492,
            c = 0.02782,
            d = 0.04253,
            e = 0.12702,
            f = 0.02228,
            g = 0.02015,
            h = 0.06094,
            i = 0.06966,
            j = 0.00153,
            k = 0.00772,
            l = 0.04025,
            m = 0.02406,
            n = 0.06749,
            o = 0.07507,
            p = 0.01929,
            q = 0.00095,
            r = 0.05987,
            s = 0.06327,
            t = 0.09056,
            u = 0.02758,
            v = 0.00978,
            w = 0.02360,
            x = 0.00150,
            y = 0.01974,
            z = 0.00074)

estimatedUserNameLen = 7

for L in sorted(count.keys()):
    probNotLetter = 1.0 – freq[L]
    probOneOrMore = 1.0 – probNotLetter ** estimatedUserNameLen
    expected = int(upperBoundOnUsers * probOneOrMore)
    print "%s: expected %6d, saw %6d." % (L, expected, count[L])

Which results in:

a: expected 231757, saw 108938.
b: expected  51531, saw  12636.
c: expected  92465, saw  13165.
d: expected 135331, saw  21516.
e: expected 316578, saw  14070.
f: expected  75281, saw   5294.
g: expected  68517, saw   8425.
h: expected 183696, saw   7108.
i: expected 204699, saw 160592.
j: expected   5500, saw   9226.
k: expected  27243, saw  12524.
l: expected 128942, saw   8112.
m: expected  80866, saw  51721.
n: expected 199582, saw  11019.
o: expected 217149, saw   9840.
p: expected  65761, saw   8139.
q: expected   3421, saw   1938.
r: expected 181037, saw  10993.
s: expected 189423, saw      0.
t: expected 250464, saw      0.
u: expected  91732, saw   8997.
v: expected  34301, saw   4342.
w: expected  79429, saw   6834.
x: expected   5392, saw   8829.
y: expected  67205, saw   8428.
z: expected   2666, saw   3245.

You can see there are wild differences here.

While it’s clearly not right to be multiplying the probability of one or more of each letter appearing in a name by the 515,931 figure (because that’s a major over-estimate), you might hope that the results would be more consistent and tell you how much of an over-estimate it was. But the results are all over the place.

I briefly considered writing some code to scrape the search results and calculate the co-occurrence frequencies (and the actual set of letters in user names). Then I noticed that the results don’t always add up. E.g., search for C and you’re told there are 13,190 results. But the results come 19 at a time and there are 660 pages of results (and 19 * 660 = 12,540, which is not 13,190).

At that point I decided not to trust Twitter’s results and to call it quits.

A promising direction (and blog post) had fizzled out. I was reminded of trying to use AltaVista to compute co-citation distances between web pages back in 1996. AltaVista was highly variable in its search results, which made it hard to do mathematics.

I’m blogging this as a way to stop thinking about this question and to see if someone else wants to push on it, or email me the answer. Doing the above only took about 10-15 mins. Blogging it took at least a couple of hours :-(

Finally, in case it’s not clear there are lots of assumptions in what I did. Some of them:

  • We’re not considering non-English letters (or things like underscores, which are common) in user names.
  • The mean length of Twitter user names is probably not 7.
  • Twitter search returns user names that don’t contain the searched-for letter (instead, the letter appears in the user’s name, not the username).

GPS serendipity: Florence Avenue, Sebastopol

Monday, July 14th, 2008

img_0601.jpgI drove from Oakland up to the O’Reilly Foo camp last Friday. The O’Reilly offices are just outside Sebastopol, CA. I stopped at an ATM and my GPS unit got totally confused. So I took a few turns at random and wound up on Florence Avenue. I drove a couple of hundred meters and started seeing big colorful structures out the front of many houses. They were so good I stopped, got out my camera, and took a whole bunch of pictures.

I talked to a man washing his car in his driveway. He told me that “Patrick” had created all the figures, and installed them on the front lawns. I got the impression that it was all free. Soon after I found the house that was unmistakably Patrick’s and seeing a man loading things into a pickup truck I went up and asked if he was Patrick. It was him and we had a friendly talk (mainly me telling him he was amazing). He gave me a calendar of his work.

Click on the thumbnails below to see bigger versions. There’s even a FC Barcelona structure. As I found out later, lots of people (of course) have seen these sculptures. When I got to Foo, there was one (image above) outside the O’Reilly office. Google for Patrick Amiot or Florence Avenue, Sebastopol and you’ll find much more. And Patrick has his own web site.

img_0556.jpgimg_0558.jpgimg_0560.jpgimg_0561.jpgimg_0567.jpgimg_0568.jpgimg_0569.jpgimg_0570.jpgimg_0572.jpgimg_0573.jpgimg_0579.jpgimg_0581.jpgimg_0582.jpgimg_0585.jpgimg_0586.jpgimg_0589.jpgimg_0592.jpgimg_0595.jpgimg_0599.jpgimg_0575.jpgimg_0577.jpgimg_0564.jpgimg_0566.jpg

Sequoia Capital is the new Delphic Oracle

Tuesday, June 17th, 2008

Consulting the OracleIn a belated attempt to educate myself by reading some of the things that many people study in high school, I’m reading The Histories of Herodotus. It’s highly entertaining and easy to read. I read The History of the Peloponnesian War by Thucydides a few years ago and enjoyed that even more. Herodotus is the more colorful, but the speeches and drama in Thucydides are fantastic.

There were lots of oracles in classical Greece, and elsewhere.Of the Greek oracles, the Delphic Oracle was, and still is, the best known. People (kings, dictators, emperors, wannabees) would send questions like “Should I invade Persia?” to the oracle and receive typically ambiguous or cryptic responses. We have a large number of famous oracular replies. Herodotus recounts how Croesus decided to test the various oracles by sending them all the same question, asking what he was doing on a certain day. The oracle at Delphi won hands down. Croesus then immediately put more pressing matters to the Delphic oracle, famously misinterpreted the pronouncements, and was duly wiped out by the Persians.

Imagine yourself in the position of the Delphic oracle. You’ve got all sorts of rulers and aspiring rulers constantly sending you their thoughts and questions, asking what you think. You’re in a unique position, simultaneously privy to the most secret potential plans of many powerful rulers. You really know what’s going on. You know what’s likely to succeed or to fail, and why. You get to give the thumbs up or thumbs down. By virtue of your position and the information flowing through your temple, you can direct traffic; you can shape and create history. You might even be tempted to profit from your knowledge. Your successful accurate pronouncements invariably reap you rich tribute.

OK, you can see where this is leading…

Sequoia Capital, and other well-known venture firms, have a somewhat similar position. They have thousands of leaders and wannabee leaders bringing them their detailed secret plans, proposing to mount armies, found cities, build empires, to attack the modern-day Persians, etc. By virtue of their unusual position they probably have a pretty good idea of what might work, and why. Using this knowledge, but without necessarily revealing sources, they can cryptically but assuredly state “oh, that’ll never work” or they can encourage ideas that are new and which they can see will somehow fit and succeed. If company X has consulted the oracle, disclosing a detailed plan to go left, and company Y plans to attack from the right, well…. why not?

Entrepreneurs beg an audience, get a tiny slice of time to make their pitch, and occasionally receive rare clear endorsements. Much more frequently they are left to scratch their heads over cryptic, ambiguous and unexplained responses (and non-responses). You can bet the Delphic oracle didn’t sign NDAs either.

It’s stretching it too far to seriously claim that Sequoia is the modern-day equivalent of the Delphic oracle. But on the other hand, over 2500 years have elapsed, so you’d expect a few changes.

Random thoughts on Twitter

Monday, June 9th, 2008

TwitterI’ve spent a lot of time thinking about Twitter this year. Here are a few thoughts at random.

Obviously Twitter have tapped into something quite fundamental, which at a high level we might simply call human sociability. We humans are primates, though there’s a remarkably strong tendency to forget or ignore this. We know a lot about the intensely social lives of our fellow primate species. It shouldn’t come as a surprise that we like to Twitter amongst ourselves too.

Here are a couple of interesting (to me) reasons for the popularity of Twitter.

One is that many people are in some sense atomized by the fact that many of us now work in an isolated way. Technical people who can do their work and communicate over the internet probably see less of their peers than others do. That’s just a general point, it’s not specific to Twitter or to 2008. It would have seemed unfathomably odd to humans 50 years ago to hear that many of us would be doing a large percentage of our work and social communication via machines, interacting with people who we don’t otherwise know, and who we rarely or never meet face to face. The rise of internet-based communication is obviously(?) helping to fill a gap created by this generational change.

The second point is specific to Twitter. Through brilliance or accident, the form of communication on Twitter is really special. Building a social network on nothing-implied asymmetric follower relationships is not something I would have predicted as leading to success. Maybe it worked, or could have all gone wrong, just due to random chance. But I’m inclined to believe that there’s more to it than that. Perhaps we’re all secretly voyeurs, or stickybeaks (nosy-parkers). Perhaps we like to see one half of conversations and be able to follow along if we like. Perhaps there’s a small secret thrill to promiscuously following someone and seeing if they follow you back. I don’t know the answer, but as I said above I do think Twitter have tapped into something interesting and strong here. There’s a property of us, we simple primates, that the Twitter model has managed to latch onto.

I think Twitter should change the dynamics for new users by initially assigning them ten random followers. New users can easily follow others, but if no-one is following them….. why bother? New user uptake would be much higher if they didn’t have the (correct) feeling that they were for some reason expected to want to Twitter in a vacuum. You announce a new program, called e.g., Twitter Guides and ask for people to volunteer to be guides (i.e., followers) of newbees. Lend a hand, make new friends, maybe get some followers yourself, etc. Lots of people would click to be a Guide. I bet this would change Twitter’s adoption dynamics. If you study things like random graph theory and dynamic systems, you know that making small changes to (especially initial) probabilities can have a dramatic effect on overall structure. If Twitter is eventually to reach a mass audience (whatever that means), it should be an uncontestable assertion that anything which significantly reduces the difficulty for new users to get into using it is very important.

Twitter should probably fix their reliability issues sometime soon.

I say “probably” because reliability and scaling are obviously not the most important things. Twitter has great value. It must have, or it would have lost its users long ago.

There’s a positive side to Twitter’s unreliability. People are amazed that the site goes down so often. Twitter gets snarled up in ways that give rise to a wide variety of symptoms. The result seems to be more attention, to make the service somehow more charming. It’s like a bad movie that you remember long afterwards because it wasn’t good. We don’t take Twitter for granted and move on the next service to pop up – we’re all busy standing around making snide remarks, playing armchair engineer, knowing that we too might face some of these issues, and talking, talking, talking. Twitter is a fascinating sight. Great harm is done by its unreliability, but the fact that their success so completely flies in the face of conventional wisdom is fascinating – and the fact that we find it so interesting and compelling a spectacle is fantastic for Twitter. They can fix the scaling issues, I hope. They should prove temporary. But the human side of Twitter, its character as a site, the site we stuck with and rooted for when times were so tough, the amazing little site that dropped to the canvas umpteen times but always got back to its feet, etc…. All that is permanent. If Twitter make it, they’re going to be more than just a web service. The public outages are like a rock musician or movie star doing something outrageous or threatening suicide – capturing attention. We’re drawn to the spectacle and the drama. We can’t help ourselves: it is our selves. We love it, we hate it, it brings us together to gnash our teeth when it’s down. But do we leave? Change the channel? No way.

Twitter is both the temperamental child rock star we love and, often, the medium by which we discuss it – an enviable position!

I’m reminded of a trick I learned during tens of thousands of miles of hitch-hiking. A great place to try for a lift is on a fairly high-speed curve on the on-ramp to the freeway / motorway / autopista / autoroute etc. Stand somewhere where a speeding car can only just manage a stop and only just manage to pull in away from the following traffic. Conventional wisdom tells you that you’ll never get a ride. But the opposite is true – you’ll get a ride extremely quickly. Invariably, the first thing the driver says when you get in is “Why on earth where you standing there? You’re very lucky I managed to stop. No-one would have ever picked you up standing there!” I’ve done this dozens of times. Twitter—being incredibly, unbelievably, frustratingly, unreliable and running contrary to all received wisdom—is a powerful spectacle. Human psyche is a funny thing. That’s a part of why it’s probably impossible to foretell success when mass adoption is required.

If I were running Twitter, apart from working to get the service to be more reliable, I’d be telling the engineering team to log everything. There’s a ton of value in the data flowing into Twitter.

Just as Google took internet search to a new level by link analysis, there’s another level of value in Twitter that I don’t think has really begun to be tapped yet.

PageRank, at least as I understand its early operation, ran a kind of iterative relaxation algorithm assigning and passing on credit via linked pages. A similar thing is clearly possible with Twitter, and some people have commented on this or tried to build little things that assign some form of score to users. But I think there’s a lot more that can be done. Because the Twitter API isn’t that powerful (mainly because you’re largely limited to querying as a single authorized user) and certainly because it’s rate-limited to just 70 API calls an hour, this sort of analysis will need to be done by Twitter themselves. I’m sure they’re well aware of that. Rate limiting probably helps them stay up, but it also means that the truly interesting and valuable stuff can’t be done by outsiders. I have no beef with that – I just wish Twitter would hurry up and do some of it.

Some examples in no order:

  • The followers to following ratio of a Twitter user is obviously a high-level measure of that user’s “importance” (in some Twitter sense of importance). But there’s more to it than that. Who are the followers? Who do they follow, who follows them? Etc. This leads immediately back to Google PageRank.
  • If a user gets followed by many people and doesn’t follow those people back, what does it say about the people involved? If X follows Y and Y then goes to look at a few pages of X’s history but does not then follow X, what do we know?
  • If X has 5K followers and re-tweets a twit of Y, how many of X’s followers go check out and perhaps follow Y? What kind of people are these? (How do you advertise to them, versus others?)
  • Along the lines of co-citation analysis, Twitter could build up a map showing you who you might follow. I.e., you can get pairwise distances between users X and Y by considering how many people they follow in common and how many they follow not-in-common. That would lead to a people you should be following that you’re not kind of suggestion.
  • Even without co-citation analysis (or similar), Twitter should be able to tell me about people that many of the people I follow are following but whom I am not following. I’d find that very useful.
  • Twitter could tell me why someone chooses to follow me. What were they looking at (if anything) before they decided to follow me? I.e., were they browsing the following list of someone else? Did they see my user name mentioned in a Tweet? Did they come in from an outside link? Would a premium Twitter user pay to have that information?
  • Twitter has tons of links. They know the news as it happens. They could easily create a news site like Digg.
  • In some sense the long tail of Twitter is where the value is. For instance, it doesn’t mean much if a user following 10K others follows someone. But if someone is following just 10 people, it’s much more significant. There’s more information there (probably). The Twitter mega users are in some way uninteresting – the more people they have following them and the more they follow, the less you really know (or care) about them. Yes, you could probably figure out more if you really wanted to, but if someone has 10K followers all you really know is that they’re probably famous in some way. If they add another 100 followers it’s no big deal. (I say all this a bit lightly and generally – the details might of course be fascinating and revealing – e.g., if you notice Jason Calacanis and Dave Winer have suddenly started @ messaging each other again it’s like IRC coming back from a network split :-))
  • Similarly if someone with a very high followers to following ratio follows a Twitter user who has just a couple of followers, it’s a safe bet that those two are somehow friends with a pre-existing relationship.
  • I bet you could do a pretty good job of putting Twitter users into boxes just based on their overall behavior, something like the 16 Myers-Briggs categories. Do you follow people back when they follow you? Do you @ answer people who @ address you (and Twitter knows when you’ve seen the original message)? Do you send @ messages to people (and how influential are those people)? Do those people @ you back (and how influential those people are says something about how interesting / provocative you are)? Do you follow tons and tons of people? Do you follow people and then un-follow them if they don’t follow you back? Do you follow random links in other people’s Twitters, and are those links accompanied by descriptive text or tinyurl links? Do you @ message people after you follow their links? Do your Twitter times follow a strict pattern, or are you on at all hours, or suddenly spending days without Twittering? Do you visit and just read much more than you tweet? How much old stuff do you read? Do you tend to talk in public or via DM? Are your tweets public?All that without even considering the content of your Twitters.
  • Could Twitter become a search engine? That’s not a 100% serious question, but it’s worth considering. I don’t mean just making the content of all tweet searchable, I mean it with some sort of ranking algorithm, again perhaps akin to PageRank. If you somehow rank results by the importance or closeness of the user whose tweets match the search terms, you might have something interesting.
  • Twitter also presumably know who’s talking about whom in the DM backchat. They can’t use that information in obvious way, but it’s of high value.

I could go on for hours, but that’s more than enough for now. I don’t feel like any of the above list is particularly compelling, but I do think the list of nice things they could be doing is extremely long and that Twitter have only just begun (at least publicly) to tap into the value they’re sitting on.

I think Google should buy Twitter. They have what Twitter needs: 1) engineering and scale, 2) link analysis and algorithm brilliance, and 3) they’re in a position to monetize the value illustrated above (via their search engine, that already has ads) without pissing off the Twitter community by e.g., running ads on Twitter. What percentage of Twitter users also use Google? I bet it’s very high.

Google maps miles off on Barcelona hotel

Tuesday, April 22nd, 2008

hotel sofiaI’m a big fan of Google maps.

But sometimes they get things very very wrong. In January I posted this example of them getting the location of the San Francisco international airport way wrong.


The screenshot linked above is supposed to show the location of the hotel Princesa Sofia in Barcelona. They have the address right, the zip code looks about right, but the location is about 30 miles off.

Caveat turista.

Individuality, transparency, and the cult of impersonality

Thursday, April 3rd, 2008

entrepreneursI’ve been talking to people about raising money for Fluidinfo over the last 5 months. Along the way I’ve had plenty of time to reflect on the process. I have a series of blog posts saved up. They’re mainly about oddities and discrepancies between appearance and reality. I plan to write them up gradually. Here’s one I wrote earlier this year but which I never finished. It’s still unpolished – but what the hell. This is a blog, after all.

In September 2007, Fred Wilson posted asking whether VCs should blog. The first thing I thought about when I read his title was transparency.

Increased transparency is a side-effect of easier communication between people. There are many relatively opaque human institutions and professions that have persisted for decades or centuries, relying on the fact that their subjects or customers were unable to communicate easily, to self-organize, to be widely heard, etc. Exclusionary access to knowledge is the foundation of power. As barriers to communication begin to fall, openness and transparency increase. Cracks appear in the walls. At that point anything can happen. The typical response is a heavy-handed crackdown to maintain or regain control. Examples are so numerous and widespread that any small sample would be woefully inadequate. This never-ending dynamic is just a part of the human condition and the nature of power.

But in some arenas, especially when there’s a market or in repeated games (a rich area of game theory), there may be a competitive advantage to (usually) smaller players who act disruptively to deliberately increase transparency. Those players differentiate themselves by (often informally) defecting from the (often tacit) group of gatekeepers. Advantages may include potential clients tending to trust you more, wide attention, and better opportunities. If increased transparency gets a foothold, there can then follow a kind of race to the bottom as players reveal increasingly more formerly-inside knowledge. This is also a drama that has been played out many times, and it’s fascinating and educational to watch.

We’re now seeing the cracks open wide in the VC world. The rise of the VC blogger has provided us with hundreds of eye-holes through which we can get some view of the works. The VC bloggers are implicitly calling out their less open colleagues, challenging them to open up. An extreme example is Venture Hacks, written by VC industry insiders, whose aim is to “open source” VC strategy in order to aid entrepreneurs. Then there’s The Funded, which shook the VC world as formerly isolated entrepreneurs got together (and in relative privacy, no less!) to exchange opinions and experiences. While The Funded is unquestionably biased, and based on small sample sizes, part of the fuss was unquestionably about control.

I awoke yesterday with another thought about transparency, why VCs should blog, and the curious dynamics of the VC/entrepreneur dance.

VCs should also blog because it allows entrepreneurs to see who they are as people. That may sound trite, but I think it’s quite interesting.

I’ve attended probably 50 events where one or more VCs takes the stage and gives some kind of a presentation. The presentations are very often excruciatingly dull. That’s because they’re filled to bursting with VC clichés. Even when VCs make an effort to differentiate themselves they tend to use clichés! They’re active investors, they have deep experience, broad contacts, want to help management, etc. I sat in the audience at Le Web a couple of weeks ago while several investors were on stage doing their thing. I wound up laughing with the guy who sat next to me, who I’d never met before. We rolled eyes at each other, passed notes, and ended up whispering nasty and disrespectful comments during the presentation. We were obviously there because we were interested to learn more, but we were served up standard VC fare. Steak and eggs.

The interesting thing is that entrepreneurs are a wildly idiosyncratic bunch. One would therefore expect that they’d tend to highly appreciate signs of character and individuality in VCs. Meanwhile VCs tend to keep things buttoned down and insist on making dreary presentations.

If nothing else, the existing dynamics are amusing. Wild-eyed, power-hungry, idiosyncratic, unconventional, and often deeply weird entrepreneurs are trying to act straight, to project an image of reliability, stability, balance, good sense, etc., in order to get funded. Simultaneously, the VC companies the entrepreneurs are evaluating, and who partly rely on being attractive to entrepreneurs, go to lengths to homogenize themselves – in the process washing out the very thing that an entrepreneur might find most reassuring.

There’s opportunity in this discrepancy. VCs who blog about themselves, in addition to talking about their industry and flogging their portfolio companies, may have tapped into this. Allowing entrepreneurs to see what you’re like as a person is a differentiator.