Add to Technorati Favorites

Nova Spivak really gets it

12:12 October 23rd, 2007 by terry. Posted under tech. | Comments Off on Nova Spivak really gets it

Usually when I hear about the thinking behind new web technology I dismiss it pretty quickly. That’s not because I don’t like what people are doing or find it interesting, I just find that almost everything is some kind of application built on an old framework. I’m much more interested in trying to change the framework itself.

I’ve been aware of Radar Networks for some time. I talked to Tim O’Reilly about Fluidinfo in March 2007, and he compared what I was saying to Nova’s claims for Radar. Now that Radar have released Twine, I’ve gone and read some of Nova’s blog postings. I probably should have done that ages ago.

It turns out we agree on many things. Here’s one in particular, in an article entitled Understanding The Semantic Web: A Response to Tim O’Reilly’s Recent Defense of Web 2.0, he has a section entitled “THE SEMANTIC WEB IS THE DATA WEB” which corresponds nicely to my why data (information representation) is the key to the coming semantic web posting.

That’s pretty refreshing. And there’s more, including well-aligned and practical thinking about the word “semantic” and various other words.

I may say more in another posting.


Orwell on intellectuals

15:01 October 22nd, 2007 by terry. Posted under books. | Comments Off on Orwell on intellectuals

One of several things I admire about Orwell is that he doesn’t pull any punches and he turns his guns on all comers. Here’s a nice passage from a 1943 review of Beggar My Neighbour by Lionel Fielden.

In the last twenty years western civilization has given the intellectual security without responsibility, and in England, in particular, it has educated him in scepticism while anchoring him almost immovably in the privileged class. He has been in the position of a young man living on an allowance from a father whom he hates. The result is a deep feeling of guilt and resentment, not combined with any genuine desire to escape. But some psychological escape, some form of self-justification there must be, and one of the most satisfactory is transferring nationalism. During the nineteen-thirties the normal transference was to soviet Russia, but there are other alternatives, and it is noticeable that pacifism and anarchism, rather than Stalinism, are now gaining ground among the young. These creeds have the advantage that they aim at the impossible and therefore in effect demand very little. If you throw in a touch of oriental mysticism and Buchmanite raptures over Gandhi, you have everything that a disaffected intellectual needs. The life of an English gentleman and the moral attitudes of a saint can be enjoyed simultaneously.

And there’s more.


Resurrection

14:46 October 22nd, 2007 by terry. Posted under me. | 1 Comment »

Today I resurrected as many of my old postings as I could find. I think I have about half. I’m still saddened by the loss of all those words. I can never believe it when I hear of writers who burn things, throw them away, etc. I even keep scraps of paper from 20 or more years ago that I wrote on. I don’t know why I place such value on simple words, but I do.

Anyway, I missed my blog. I miss some of the postings that are now gone forever.

I’m going to blog every single day, at least for today. Watch me.


I’m back

12:15 October 21st, 2007 by terry. Posted under me. | Comments Off on I’m back

Well, I’m back.

The previous instantiation of this blog was washed away in a storm in early August. A server got hacked and in my hurry to have it decommissioned I forgot to pull out the MySql database for my blog. I’m still annoyed at myself – partly because it’s so public and basic an error, but mainly because I care so much about words and now all those words are gone. A recovery operation using google cache and the wayback machine got me about half of the posts back. I may add them here at some point. I’m pissed that I lost so much stuff, and there’s no-one to blame but myself.


smell the fear

14:24 June 30th, 2007 by terry. Posted under companies, tech. | Comments Off on smell the fear

Entertainment retailers are not happy that Prince is giving away his upcoming album, via a deal with the Mail on Sunday newspaper. Their reaction is one of abject fear with a sprinkling of nonsense:

It would be an insult to all those record stores who have supported Prince throughout his career

All those stores making all that money, colluding to fix prices, over all those years, and they were just doing it to support the artists! My heart bleeds for them.

You can almost smell the fear.


forgetting how to dial international

14:56 June 26th, 2007 by terry. Posted under me, tech. | 1 Comment »

A weird thing happened to me this morning.

I needed to call someone in Portugal. I reached for the trusty land line, checked for a dial tone (so old fashioned), grabbed the number, and went to dial. Then I realized I didn’t remember the prefix to dial to get out of the country!

That’s pretty amazing. I’ve been living “overseas” (whatever that means) for over 20 years, and I’ve made plenty of international calls in that time.

I’ve been using Skype for international calls almost exclusively for at least 2 years.

Concepts like “dial tone” and “international dialing prefix” are soon going to appear extremely quaint.

I took my kids to a flea market a couple of months ago. We ran across a rotary phone. Although they knew it was a phone, they couldn’t figure out how you were supposed to dial. Why not just push a button? Dial tone? Access code? Why not just push a (mouse) button?


my O’Reilly number

11:18 June 25th, 2007 by terry. Posted under books, companies, tech. | Comments Off on my O’Reilly number

I like O’Reilly technical books. Back in 1987 I put together some notes to write a book on the vi editor, and later considered submitting the idea to O’Reilly. I used to think I knew just about everything there was to know about vi, at least as a user, and I spent a small amount of time fiddling with its code to fix some limitations. Of course now being a hardened emacs user, it’s a good thing I didn’t blot my career early by writing a book on a crappy editor like vi.

I just did a quick count of the O’Reilly titles on my shelves: I have fifty five.

And you?


literary arbitrage

14:58 June 20th, 2007 by terry. Posted under books, companies. | Comments Off on literary arbitrage

The two books I just bought on Amazon.com cost me $37.74, plus shipping to Spain of $13.47, for a total of $51.21.

The same books are available on Amazon.co.uk for a total of £28.35, plus shipping to Spain of £5.97 and VAT of £1.37 for a grand total of £35.69 or USD $71.15.

So you can pay $51 to have the books shipped (in theory) from the US, or pay roughly 40% more and have them shipped (in theory) from the UK. The difference in shipping time isn’t much either, in practice. Even if the price of mailing in the UK were free and there were no VAT, it would still be cheaper to have books sent from the US.

The dollar hit a 26-year low against the pound in April of this year (2007). If it keeps falling and Amazon don’t adjust their pricing, I might start a side business in literary arbitrage.


better together

12:35 June 20th, 2007 by terry. Posted under books, companies, tech. | Comments Off on better together

Amazon, intentionally or not, have done a great job with their special offer feature that suggests a second book to you and offers you both at the same time for a discount.

One could argue that it’s not in their interests to offer you a second book that you would buy later anyway at its normal price. (Yes, you can argue that it’s implicitly in their interest because it creates goodwill.)

At least in this customer’s experience, they do a great job of offering me things that I might want but never offering me anything I already know that I want. You might think that that’s because I always immediately buy everything I want, but that’s not true.

Today they slipped up and offered me something I knew in advance that I also wanted. I went to look at Glut: Mastering Information Through the Ages, and after I clicked to see the book, I wondered if they might just maybe offer me Everything Is Miscellaneous: The Power of the New Digital Disorder. And… they did.

That’s a first for me. I buy lots of books on Amazon, and I’ve never been offered something I knew I wanted.

Of course it’s also in their interests to occasionally slip up like this. Then people write blog posts praising them and saying how good their algorithms are.

At least for me, Amazon’s “better together” is almost pitch perfect. They consistently land tempting titles just outside the small ring of books I’ve already decided I’m going to buy at some later point. (Note that making special offers like this is very different from the far simpler “customers who bought X also bought Y” – which is just a lookup.) It’s easy to imagine Amazon’s algorithms trying to figure out what I’m almost certainly going to buy anyway, and what I might well buy but probably wont, and picking something tantalizing and just over the edge, just out of reach. What a great way to push readers’ boundaries while making more sales and not leaving money on the table.

Whatever’s going on, and whatever you think might be going on, it’s clearly not simple to keep customers happy and enthusiastic via special offers that do not sacrifice money the customer would in fact spend anyway.


Pondering the T&C of Amazon’s S3 and EC2

03:53 June 19th, 2007 by terry. Posted under companies, tech. | Comments Off on Pondering the T&C of Amazon’s S3 and EC2

I’ve spent many hours reading about Amazon’s S3 and EC2 services since they were announced. They’re certainly very attractive, and they are being put to heavy use by many companies. There’s a list of examples over on O’Reilly Radar. Don MacAskill of SmugMug gave a great talk at ETech about SmugMug’s use of S3. SmugMug have something like 200TB in storage at S3.

I think S3 and EC2 are fantastic and innovative offerings from Amazon. I’d love to use them for my own project.

But if you read the Web Services Licensing Agreement, it’s quite worrying. Or at least it should be worrying for anyone whose potentially S3/EC2-reliant service may one day rub Amazon the wrong way.

Here are a few extracts:

5. You agree to provide such additional information and/or other materials related to your Application as reasonably requested by us or our affiliates to verify your compliance with this Agreement.

What does “other materials” include? Source code?

If your Application is available as an online solution, you acknowledge and agree that we (and/or our affiliates) may crawl or otherwise monitor your Application for the purpose of verifying your compliance with this Agreement, and that you will not seek to block or otherwise interfere with such crawling or monitoring (and that we and/or our affiliates may use technical means to overcome any methods used on your Application to block or interfere with such crawling or monitoring).

“Otherwise monitor” is pretty creepy and all-encompassing. I’m supposed to give Amazon blanket permission to monitor my service in any way they choose? I think it’s fair enough for them to reserve the right and means to verify that I’m in accordance with the agreed T&C, but the above language is…. well, see below.

If your Application is a desktop solution, you agree to furnish a copy of your Application upon request for the purpose of verifying your compliance with this Agreement.

What does this mean? Source code?

And then we get to the real kicker:

8) If your Application is determined (for any reason or no reason at all, in our sole discretion) to be unsuitable for Amazon Web Services, we may suspend your access to Amazon Web Services or terminate this Agreement at any time, without notice.

Wow.

But big net-and-web-friendly Amazon, they wouldn’t just pull the plug on something they didn’t like. Would they? The experience of Zlio might make you wonder, as might the experience of Alexaholic Statsaholic.

From what little I know of those two cases, I don’t see a reason to condemn Amazon. But they do give pause, and section 8 of the T&C is frightening. There’s more in the agreement that I find vague (just what is an Amazon Property?), but that’s enough examples for now.

IANAL, but I’ve worked on and negotiated dozens of contracts. What we have here is a contract for services drawn up by the lawyers of just one party. This is the kind of shot across the bows you can take when your side gets to draft the contract, and it inevitably comes back with Unacceptable or Rejected all over the place, especially when you’ve egregriously over-reached. You know you’re over-reaching, of course. You get to frame the terms of the contract, which is why it’s so nice to do the first draft.

And yes, OK, Amazon is offering a service, they can define the price and the T&C as they see fit, and you can like it or lump it. But there’s another way, which is to push back a little.

S3 and EC2, and most likely future Amazon offerings, are important. They change a lot and they deserve to be widely used. It’s worth fighting about because they’re so great, because the T&C could be fixed, and because drafters of contractual terms like these expect you to push back.

Potential customers shouldn’t have to worry that Amazon might cut them off without warning and without reason. We should instead speak up and push for a better deal. Because right now the terms of the deal are totally one-sided. Amazon are big enough and mature enough and smart enough to know that it’s in their interests to make S3, EC2 and the rest of their web services as big as possible, and of course they know that their T&C are over-reaching.

If you’re building something that Amazon may one day decide they don’t like, or that they want to compete with, I’d be careful about using S3 or EC2. What if Amazon come along one day and offer to buy you for a deliberately lowball price—or else? What if [insert evil villain] calls up Jeff Bezos one day and makes a deal to have your service cut off? That’s going to be totally opaque to you, and you have no recourse. What if Amazon is bought by XXX, who then decide to cut you off? This may all sound farfetched, but these sorts of things do happen.

Comment #2 on the Zlio RW/W page I referenced above makes an important point. Amazon’s platform is akin to an operating system on which services can be built. Amazon promotes it like a platform. But they reserve the right to dump you unceremoniously, without notice, and without reason. Come on Amazon! We may be fragile startups dying to use your services, but we’re not idiots. If you want to build a platform and have people use it, do it properly. Otherwise, you’re just reserving the right to act like Microsoft after they finally woke up and realized that they could write applications for their OS too, and proceeded to use ugly means to wipe out competitors – to their ongoing and deserved detriment. But even Microsoft didn’t have an EULA that said they could take the OS away from you any time they felt like it.

Given a choice between Amazon cutting the price on S3 again and having them revise their T&C, I’d much rather the latter. But if we all silently accept their T&C, there’s no reason for them to revisit.

A few small changes could make Amazon’s web services irresistible.


Sort uniq sort revisited, in modern Python

00:16 June 17th, 2007 by terry. Posted under python, tech. | Comments Off on Sort uniq sort revisited, in modern Python

Just after I started messing around with Python, my friend Nelson posted about writing some simple Python to speed up the UNIX sort | uniq -c | sort -nr idiom.

I played with it a bit trying to speed it up, and wrote several versions in Python and Perl. This was actually just my second Python program.

The other night I was re-reading some newer Python (2.5) docs and decided to try applying the latest and greatest Python tools to the problem. I came up with this:

from sys import stdin
from operator import itemgetter
from collections import defaultdict

total = 0
data = defaultdict(int)
freqCache = {}

for line in stdin:
    data[line] += 1
    total += 1

for line, count in sorted(data.iteritems(), key=itemgetter(1), reverse=True):
    frac = freqCache.setdefault(count, float(count) / total)
    print "%7d %f %s" % (count, frac, line),

In trying out various options, I found that defaultdict(int) is hard to beat, though using defaultdict with an inline lambda: 0 or a simple def x(): return 0 are competitive.

In the solution I sent to Nelson, I simply made a list of the data keys and sorted it, passing lambda a, b: -cmp(data[a], data[b]) as a sort comparator. Nelson pointed out that this was a newbie error, as it stops Python from taking full advantage of its blazingly fast internal sort algorithm. But…. overall the code was quite a bit faster than Nelson’s approach which sorted a list of tuples.

So this time round I was pretty sure I’d see a good improvement. The code above just sorts on the counts, and it lets sort use its own internal comparator. Plus it just runs through the data dictionary once to sort and pull out all results – no need to fish into data each time around the print loop. So it seemed like the best of both worlds.

But, this code turns out to be about 10% slower (on my small set of inputs, each of 200-300K lines) than the naive version which extracts data.keys, sorts it using the above lambda, and then digs back into data when printing the results.

It looks nice though.


reflective bandwagon

02:26 June 14th, 2007 by terry. Posted under me, tech. | Comments Off on reflective bandwagon

Here’s another thing I’ve had enough of: The graphic design bandwagon of which this image is a perfect example:

zoho

This technique is like a rash all over the web. It’s one thing to jump on the bandwagon and make your site look all cool and Web2.0-esque, but there’s another thing about these images that bugs me.

I don’t understand them.

There’s something about them that just doesn’t work for me. When I look at an image like the above, it somehow doesn’t sit right in my mind. I mean, where’s the light coming from? That’s not a shadow, it’s a reflection. It’s bouncing off that nice shiny black highly-reflective surface. So I guess the solution is that there is a bright light somewhere behind me and above my head. Is that it?

Images that have a shadow next to them or behind them are so much easier to deal with. But that was the bandwagon 10 years ago. Now we have the Web2.0 effect in full color, not boring gray. It’s romantic, it’s engaging, and it’s coming right at you, like, like, yes like a perfect reflection on a cool and glassy alpine lake.

And it’s….. everywhere.


it’s long

02:00 June 14th, 2007 by terry. Posted under books, me. | 2 Comments »

There are a few things that bug me on the internet.

One is that people often warn each other that articles are long, or apologize for writing long blog entries. There’s nothing inherently wrong with that. When it turns out though that these items are just a couple of screenfuls, you start to wonder what we’re all coming too. And yes, I know, it’s the 21st century, we’re all living at internet speed now, who’s got the time, etc.

OTOH, a word like “long” can be used to convey information. You can look at the word “long” and form some idea of just how long the long thing might be. And these days, it ain’t very long. Maybe we’re in the middle of a transition in which a word comes to mean its opposite.

Marc Andreessen recently began to blog, and the blogosphere is all abuzz. He writes tolerably well, and he’s got interesting comments on many things, but there’s a real down side: his posts are really long. Here’s a random example of someone who agrees.

That’s weird.

From where I sit, if someone writes well and is interesting or otherwise provocative, you wish they’d write more, not less. You want it to be long. Half a dozen web pages is not long. I read In Search of Lost Time last year. It took me 6 months and at 4300 pages or so, I think it qualifies as long. I’m reading Orwell’s letters, essays, and journalism. At 2200 pages, it seems fairly long too. I wished Proust was longer. I’ll probably wish Orwell was longer too. I tried reading The Decline and Fall of the Roman Empire (3500 pages), but the 7-volume “leatherette” set I bought stinks of old cigarette smoke and I couldn’t bear it.

How did we get from “long” meaning something like War and Peace (1100 pages) or Anna Karenin (850 pages) all the way to a 6-page (single narrow column) blog posting (with plenty of white space)?

What word should we now use for things that are longer than 6 pages or that require more than 5 minutes to read? Epic?


resorting to regular expressions

22:53 June 13th, 2007 by terry. Posted under python, tech. | 1 Comment »

I was going to write a much longer set of thoughts on moving to Python, but I don’t have time. Instead I’ll summarize by saying that I programmed for 28 years in various languages before switching to Python nearly 2 years ago.

I like Python. A lot. And there are multiple reasons, which I may go into another time.

One thing that has struck me as very interesting is my use of regular expressions. I came to Python after doing a lot of work in Perl (about 8 years). In Perl I used regular expressions all the time. And I mean every single day, many times a day. I like regular expressions. I understand pretty well how they work. I found multiple errors in the 2nd edition of Mastering Regular Expressions. I made a 20% speedup to version 4.72 of Grepmail with a trivial change to a regex. I put both GNU and Henry Spencer regex support into strsed. I use them in emacs lisp programming and in general day-to-day emacs usage, and in their limited form on the shell command line and in grep.

So given that regular expressions are so powerful, that I well know how to wield them, and that I did so perhaps ten thousand times during those 8 years of Perl, you might expect that I’d use them frequently in Python.

But that’s not the case.

In two years of writing Python almost every day, I think I’ve probably only used regular expressions about 10 times!

I’m not going to speculate now on why that might be the case. I’m writing this partly to see if others (in my huge circle of readers) have experienced something similar. I was prompted to write by an svn check in message of Daniel’s last night. He said:

You know things are bad when you find yourself resorting to regular expressions

And I knew exactly what he meant. When I find myself reaching for the Python pocket guide to refresh my memory on using Python regular expressions, it’s such an unusual event (especially given the contrast mentioned above) that I find myself wondering if maybe I’m doing something really inefficient and unPythonic.


Comments on Productivity and being Always-On

02:41 June 11th, 2007 by terry. Posted under companies, tech. | Comments Off on Comments on Productivity and being Always-On

Antonio over at the Onda has a post up about Productivity and being Always-On. He’s got comments turned off, so I’m going to make a few here.

First of all, I really enjoy Antonio’s writings. That’s why I read his blog. But today I just need to push back a little :-) I think all four of Antonio’s points about what you can expect to go wrong are rather weak and/or misleading.

Let’s go through them.

Power (this one was on me for being unprepared). Between Spain and England, I discovered 3 different plug types. What is more, if you travel with a laptop and a phone (more than one device to plug in) and check in late, good luck getting the hotels to have anything to lend you to plug your American appliances in.

You could substitute the U.S. for Spain or the UK in this sentence and it would remain true. There’s actually a good deal of standardized plug size across Europe. Yes, the UK and the US (and some other countries) do things differently. But Spain is part of a large swathe of countries that follow a standard. I could mention the use of 110 volt devices, but I wont. But I do suggest, just for fun, going to the reception of some US hotels and asking them if they have a European plug converter they could lend you. Or try asking for two. I’ve lived 10 years in the US and 10 years in Europe and I have a fairly strong opinion about where you’re more likely to find accommodating help for stuff that requires regular employees of a company to even be aware of the existence of other countries.

Consistent SMS/data on your cellphone. Having just switched to a GSM network, I was really excited by the prospect of 3G networks and zippy-fast mobile data. While voice worked everywhere, SMS and data did not. In fact, SMS was the flakiest of all of the services that I’ve come to rely on— I could receive messages almost everywhere, but I had at best 50% odds of being able to send them.

I’d put this down to (probably) having a mixture of Europe and US carriers involved. I also spent nearly 5 years working in the cell phone industry and know first hand from various carriers that passing SMS between their networks is (or was a few years back) hugely flaky. Someone from a US carrier (I don’t remember which), told me that, officially, US-Euro SMS was not supported by their network but that messages did sometimes “leak” through, but they weren’t sure how! In Spain I find SMS extremely reliable, and I send probably 200/month. When in the US I also have not-infrequent problems, in both directions.

And as far as the wi-fi is concerned, it does seem to be fairly ubiquitous, but in 100% of the cases it was expensive and encumbered by either its billing mechanism or by some lame proxy server setup that blocked most of the useful Internet services you’d want to get access to.

The same could easily be said of the US, and probably every other country. This is too general a complaint – I’ve encountered expensive brain-dead wifi all over the place. One pleasant exception is the airport at Las Vegas, with free wifi. Plus see below.

Overall Internet speed. Finally, the speed of “broadband” connections (especially in Spain) is painful. In this new world of rich Internet applications, it’s easy to forget that we’ve only just been able to get to the point where we can use them in the US and that this is far from a given for other parts of the world. For instance, in Spain Tabblo.com was completely unusable, and even Gmail was severely hobbled by the dearth of bandwidth.

This is also very weak. Who was the ISP? In what city? What sort of bandwidth was the contract? How many different places, ISPs, did you try out? It’s like saying “I went to the US and my broadband connection sucked, so therefore broadband connections suck in the US”. FWIW, I’ve had an ADSL connection with a fixed IP address in Barcelona for about 7 years. I had the connection for several years, at a cost of about US$30/month during which the CEO of the company I worked for in Manhattan couldn’t even get any DSL connection to his Manhattan apartment. I mean nothing. He was using a modem for years while I had a much zippier always-on connection. These days I have a theoretical max of 1Mb up and 20Mb down, and the last time I tested it it was running at about 6Mb. A connection at that speed can be had from Ya for just US$26/month. I ssh into servers and the connections stay up until I close them (often many days). I can even work with Tabblo. I know dozens of people here who use GMail as their only mail source, and I’ve seen it working just fine, without noticeable delay.

That’s it for now I guess. While I’m sure Antonio’s experiences happened, they read like someone comparing their comfortable home setup with what they experienced as a foreign tourist. Of course those experiences will be very different, even if the underlying services are identical. You see the same thing when tourists complain about how expensive a country is. Yes, you can pay 12 euros (US$16!) for a large (and I mean beer stein large) Fanta on the Ramblas. But that says more about you than it does about Spain :-)


Orwell on T. S. Eliot and the path from existential angst to serial entrepreneur

18:06 June 7th, 2007 by terry. Posted under books, companies, me. | 10 Comments »

I like George Orwell. A tired fool got me started on the four-volume collection of Orwell’s essays, journalism, and letters. It’s great. Among many things I could say, one is that you know you’re reading someone damned good if you’re fascinated by their thoughts on something you formerly had no interest or experience in. There’s the essay on Dickens that I mentioned earlier, essays on cheap vulgar postcards, boys magazines, and much else besides. Gore Vidal is similarly compelling, and I think I would take his collected essays even over those of Orwell. Christopher Hitchens is similarly provocative but not in the same class as a writer. Very few are.

Today I was reading an Orwell review of three T. S. Eliot poems. I’m not into Eliot and I’m not into poetry. Like Gore Vidal’s, Orwell’s reviews are wonderful – balanced and surgical skewerings. Anyway, I came across the following, which I enjoyed enormously and decided to post:

But the trouble is that conscious futility is something only for the young. One cannot go on ‘despairing of life’ into a ripe old age. One cannot go on and on being ‘decadent’, since decadence means falling and one can only be said to be falling if one is going to reach the bottom reasonably soon. Sooner or later one is obliged to adopt a positive attitude towards life and society. It would be putting it too crudely to say that every poet in our time must either die young, enter the Catholic Church, or join the Communist party, but in fact the escape from the consciousness of futility is along those general lines. There are other deaths besides physical death, and there are other sects and creeds besides the Catholic Church and the Communist Party, but it remains true that after a certain age one must either stop writing or dedicate oneself to some purpose not wholly aesthetic. Such a dedication necessarily means a break with the past:

every attempt
Is a wholly new start, and a different kind of failure

Because one has only learnt to get the better of words
For the thing one no longer has to say, or the way in which
One is no longer disposed to say it. And so each venture
Is a new beginning, a raid on the inarticulate
With shabby equipment always deteriorating
In the general mess of imprecision of feeling,
Undisciplined squads of emotion.

Apart from the fact that I am much too impatient to read poetry, one of my problems is that I never have any idea what it’s about. But at least the above is clear. It wonderfully captures the inevitable progression from the troubled search for meaning of existential youth to the amorphous struggles of the serial entrepreneur.


Why I HATE my 17″ MacBook Pro (Intel)

23:24 May 16th, 2007 by terry. Posted under companies, tech. | Comments Off on Why I HATE my 17″ MacBook Pro (Intel)

I’m trying to work. But right now I’m not working, I’m writing this.

Q: Why am I writing this instead of working?

A: Because my MacBook Pro has decided to spend 2 minutes showing the spinning color wheel in the application I was trying to use.

I used to have a 17″ G4 Powerbook with 2GB of RAM. When people asked me what I thought of it I was always very positive and enthusiastic in my reply. Then I got a 17″ MacBook Pro, and I hate it and wouldn’t recommend it to anyone.

Every single day, probably half a dozen times, I’ll be using some application (in this case Aquamacs, a version of Emacs for the Mac) when, for no apparent reason, the mouse cursor will change to the spinning pizza wheel of death, and remain that way for up to a couple of minutes. This happens to me in Firefox too, so it’s definitely not an Emacs thing. It probably happens in other apps too, I haven’t paid as much attention as I probably should have.

I’m too busy swearing at the machine.

Anyway….. what the hell!? I wish I could just throw this piece of junk away and get something that just works. I’m planning to head back to Linux next time around. I can’t believe that this machine could have made it out the door. But I need it too much to be able to just send it back, so I’m stuck, and I resort to cursing and blogging.

I have other complaints too. I run a make clean that removes a bunch of files (around 20M of stuff, not that many files). Sometimes the rm command will just sit there for 15 or 20 seconds before completing. What’s it thinking about? I run unit tests on Python code all the time. Sometimes the python command to run the tests will just sit there, the other day it was taking almost a minute to launch python. Yes, I know, I can go look at what’s going on on my machine, and I do, and sometimes, yes, it’s busy. But this is just ridiculous. I’ve been using UNIX for 25 years and I haven’t had to wait for things like this since the early 80’s (and then on a machine I was sharing with up to 128 others). It’s infuriating to have the expensive latest and greatest laptop and then have it perform like a dog.

FWIW, the machine is probably a year old. Probably these seem like small problems, but I think Apple really screwed up, and there’s plenty of complaints online about problems with these laptops. Blast them to hell.

Now back to work – if I can manage to get a cursor in Emacs, that is.


iteranything

12:22 May 7th, 2007 by terry. Posted under python, tech. | Comments Off on iteranything

Here’s a Python function to iterate over pretty much anything. In the extremely unlikely event that anyone uses this code, note that if you pass keyword arguments the order of the resulting iteration is not defined (as with iterating through any Python dictionary).

from itertools import chain
import types

def iteranything(*args, **kwargs):
    for arg in chain(args, kwargs.itervalues()):
        t = type(arg)
        if t == types.StringType:
            yield arg
        elif t == types.FunctionType:
            for i in arg():
                yield i
        else:
            try:
                i = iter(arg)
            except TypeError:
                yield arg
            else:
                while True:
                    try:
                        yield i.next()
                    except StopIteration:
                        break

if __name__ == '__main__':
    def gen1():
        yield 1
        yield 2

    def gen2():
        yield 3
        yield 4

    assert list(iteranything()) == []
    assert list(iteranything([])) == []
    assert list(iteranything([[]])) == [[]]
    assert list(iteranything([], [])) == []
    assert list(iteranything(3)) == [3]
    assert list(iteranything(3, 4)) == [3, 4]
    assert list(iteranything(3, 4, dog='fido')) == [3, 4, 'fido']
    assert list(iteranything(3, 4, func=gen1)) == [3, 4, 1, 2]
    assert list(iteranything(3, 4, func=gen1())) == [3, 4, 1, 2]
    assert list(iteranything(3, 4, func=iteranything)) == [3, 4]
    assert list(iteranything(3, 4, func=iteranything())) == [3, 4]
    assert list(iteranything(3, 4, func=iteranything('a', 'b', c='z'))) ==
        [3, 4, 'a', 'b', 'z']
    assert list(iteranything(3, 4, func=iteranything('a',
        iteranything(5, 6), c='z'))) == [3, 4, 'a', 5, 6, 'z']
    assert list(iteranything(None, 'xxx', True)) == [None, 'xxx', True]
    assert list(iteranything(3, 4, [5, 6])) == [3, 4, 5, 6]
    assert list(iteranything(3, 4, gen1, gen2)) == [3, 4, 1, 2, 3, 4]
    assert list(iteranything(3, 4, gen1(), gen2())) == [3, 4, 1, 2, 3, 4]
    assert list(iteranything(1, 2, iteranything(3, 4), None)) ==
        [1, 2, 3, 4, None]
    assert list(iteranything(1, 2, iteranything(3, iteranything(1, 2,
        iteranything(3, 4), None)))) == [1, 2, 3, 1, 2, 3, 4, None]

the blind leading the blind?

01:22 May 1st, 2007 by terry. Posted under companies, me. | 2 Comments »

At some point I read a description of why entrepreneurs pitching VCs is a bad mix: because you have people who can’t explain anything meeting people who can’t understand anything. That’s unfair all round of course, but still…. Having done some of this and had people just not “get” it, you can’t but wonder why they don’t get it. Of course part of it can be about the pitch itself, the presentation, etc etc. But even if you suppose everything is ideal, investing (both as a founder and as a financial backer) are both an act of faith.

There’s lots of evidence for this. To begin with, if it were a science and there were quantifiable measures, those things would presumably be known and you’d have a lot fewer startups and a lot fewer investors, simply because failure would be rare.

Until recently I thought there was more of a lack of vision on the investor side. But now I’m not so sure. For example, the Google guys were apparently running around search engine companies trying to sell their idea (vision? early startup?) for $1M. They couldn’t find a buyer. What an extraordinary lack of….. what? On the one hand you want to laugh at those idiot companies (and VCs) who couldn’t see the huge value. OK, maybe. But the more extraordinary thing is that Larry Page and Sergei Brin couldn’t see it either! That’s pretty amazing when you think about it. Even the entrepreneurs couldn’t see the enormous value. They somehow decided that $1M would be an acceptable deal. Talk about a lack of vision and belief.

So you can’t really blame the poor VCs or others who fail to invest. If the founding tech people can’t see the value and don’t believe, who else is going to?

I usually enjoy Paul Graham’s essays. In a recent one, The Hacker’s Guide To Investors, he says:

Risk is always proportionate to reward. So the most successful startup of all is likely to have seemed an extremely risky bet at first, and that is exactly the kind VCs won’t touch.

Which is also pretty interesting. In some ways it’s like a horoscope – appealing to every dreamer who believes they’re sitting on a billion-dollar idea. But if the always is true in the above quote, then if it happens that you are in fact sitting on something that will bring huge rewards, then it by definition must appear hugely risky.

If you throw in the initial observation that even the founders cannot assess value, then I think you get three things. One: a feeling of taking huge risk is a necessary part of building something that’s hugely rewarding (i.e., if you don’t have that feeling, then you’re probably not building such a thing). Two: even if you are building such a thing, you cannot know it. You just have to believe. Three: if you cannot know it, but can see huge risk, you can’t expect investors to see things any differently. So to get them to invest you really have to make them believers too.


why data (information representation) is the key to the coming semantic web

01:51 March 19th, 2007 by terry. Posted under me, representation, tech. | 5 Comments »

In my last posting I argued that we should drop all talk about Artificial Intelligence when discussing the semantic web, web 3.0, etc., and acknowledge that in fact it’s all about data. There are two points in that statement. I was scratching an itch and so I only argued one of them. So what about my other claim?

While I’m not ready to describe what my company is doing, there’s a lot I can say about why I claim that data is the important thing.

Suppose something crops up in the non-computational “real-world” and you decide to use a computer to help address the situation. An inevitable task is to take the real-world situation and somehow get it into the computational system so the computer can act on it. Thus one of the very first tasks we face when deciding to use a computer is one of representation. Given information in the real world, we must choose how to represent it as data in a computer. (And it always is a choice.)

So when I say that data is important, I’m mainly referring to information representation. In my opinion, representation is the unacknowledged cornerstone of problem solving and algorithms. It’s fundamentally important and yet it’s widely ignored.

When computer scientists and others talk about problem solving and algorithms, they usually ignore representation. Even in the genetic algorithms community, in which representation is obviously needed and is a required explicit choice, the subject receives little attention. But if you think about it, in choosing a representation you have already begun to solve the problem. In other words, representation choice is a part of problem solving. But it’s never talked about as being part of a problem-solving algorithm. In fact though, if you choose your representation carefully the rest of the problem may disappear or become so trivial that it can be solved quickly by exhaustive search. Representation can be everything.

To illustrate why, here are a couple of examples.

Example 1. Suppose I ask you to use a computer to find two positive integers that have a sum of 15 and a product of 56. First, let’s pick some representation of a positive integer. How about a 512-bit binary string for each integer? That should cover it, I guess. We’ll have two of them, so that will be 1,024 bits in our representation. And here’s an algorithm, more or less: repeatedly set the 1,024 bits at random, add the corresponding integer values, to see if they sum to 15. If so, multiply them and check the product too.

But wait, wait, wait… even my 7-year-old could tell you that’s not a sensible approach. It will work, eventually. The state search space has 21024 candidate solutions. Even if we test a billion billion billion of them per second, it’s going to take much longer than a billion years.

Instead, we could think a little about representation before considering what would classically be called the algorithm. Aha! It turns out we could actually represent each integer using just 4 bits, without risk of missing the solution. Then we can use our random (or an exhaustive) search algorithm, and have the answer in about a billionth of a second. Wow.

Of course this is a deliberately extreme example. But think about what just happened. The problem and the algorithm are the same in both of the above approaches. The only thing that changed was the representation. We coupled the stupidest possible algorithm with a good representation and the problem became trivial.

Example 2. Consider the famous Eight Queens problem (8QP). That’s considerably harder than the above problem. Or is it?

Let’s represent a chess board in the computer using a 64-bit string, and make sure that exactly 8 bits are set to one to indicate the presence of a queen. We’ll devise a clever algorithm for coming up with candidate 64-bit solutions, and write code to check them for correctness. But the search space is 264, and that’s not a small number. It could easily take a year to run through that space, so the algorithm had better be pretty good!

But wait. If you put a queen in row R and column C, no other queen can be in row R or column C. Following that line of thinking, you can see that all possibly valid solutions can be represented by a permutation of the numbers 1 through 8. The first number in the permutation gives the column of the queen in the first row, and so on. There are only 8! = 40,320 possible arrangements that need to be checked. That’s a tiny number. We could program it up, use exhaustive search as our algorithm, and have a solution in well under a second!

Once again, a change of representation has a radical impact on what people would normally think of as the problem. But the problem isn’t changing at all. What’s happening is that when you choose a representation you have actually already begun to solve the problem. In fact, as the examples show, if you get the representation right enough the “problem” pretty much vanishes.

These are just two simple examples. There are many others. You may not be ready to generalize from them, but I am.

I think fundamental advances based almost solely on improved representation lie just ahead of us.

I think that If we adopt a better representation of information, things that currently look impossible may even cease to look like problems.

There are other people who seem to believe this too, though perhaps implicitly. Web 3.0, whatever that is, can bring major advances without anyone needing to come up with new algorithms. Given a better representation we could even use dumb algorithms (though perhaps not pessimal algorithms) and yet do things that we can’t do with “smart” ones. I think this is the realization, justifiably exciting, that underlies the often vague talk of “web 3.0″, the “read/write web”, the “data web”, “data browsing”, the infinite possible futures of mash ups, etc.

This is why, to pick the most obvious target, I am certain that Google is not the last word in search. It’s probably not a smart idea to try to be smarter than Google. But if you build a computational system with a better underlying representation of information you may not need to be particularly intelligent at all. Things that some might think are related to “intelligence”, including the emergence of a sexy new “semantic” web, may not need much more than improved representation.

Give a 4-year-old a book with a 90%-obscured picture of a tiger in the jungle. Ask them what they see. Almost instantly they see the tiger. It seems incredible. Is the child solving a problem? Does the brain or the visual system use some fantastic algorithm that we’ve not yet discovered? Above I’ve given examples of how better representation can turn things that a priori seemed to require problem solving and algorithms into things that are actually trivial. We can extend the argument to intelligence. I suspect it’s easy to mistake someone with a good representation and a dumb algorithm as being somehow intelligent.

I bet that evolution has produced a representation of information in the brain that makes some problems (like visual pattern matching) non-existent. I.e., not problems at all. I bet that there’s basically no problem solving going on at all in some things people are tempted to think of as needing intelligence. The “algorithm”, and I hesitate to use that word, might be as simple as a form of (chemical) hill climbing, or something even more mundane. Perhaps everything we delight in romantically ascribing to native “intelligence” is really just a matter of representation.

That’s why I believe data (aka information representation) is so extremely important. That’s where we’re heading. It’s why I’m doing what I’m doing.