Archive for June, 2007

smell the fear

Saturday, June 30th, 2007

Entertainment retailers are not happy that Prince is giving away his upcoming album, via a deal with the Mail on Sunday newspaper. Their reaction is one of abject fear with a sprinkling of nonsense:

It would be an insult to all those record stores who have supported Prince throughout his career

All those stores making all that money, colluding to fix prices, over all those years, and they were just doing it to support the artists! My heart bleeds for them.

You can almost smell the fear.

forgetting how to dial international

Tuesday, June 26th, 2007

A weird thing happened to me this morning.

I needed to call someone in Portugal. I reached for the trusty land line, checked for a dial tone (so old fashioned), grabbed the number, and went to dial. Then I realized I didn’t remember the prefix to dial to get out of the country!

That’s pretty amazing. I’ve been living “overseas” (whatever that means) for over 20 years, and I’ve made plenty of international calls in that time.

I’ve been using Skype for international calls almost exclusively for at least 2 years.

Concepts like “dial tone” and “international dialing prefix” are soon going to appear extremely quaint.

I took my kids to a flea market a couple of months ago. We ran across a rotary phone. Although they knew it was a phone, they couldn’t figure out how you were supposed to dial. Why not just push a button? Dial tone? Access code? Why not just push a (mouse) button?

my O’Reilly number

Monday, June 25th, 2007

I like O’Reilly technical books. Back in 1987 I put together some notes to write a book on the vi editor, and later considered submitting the idea to O’Reilly. I used to think I knew just about everything there was to know about vi, at least as a user, and I spent a small amount of time fiddling with its code to fix some limitations. Of course now being a hardened emacs user, it’s a good thing I didn’t blot my career early by writing a book on a crappy editor like vi.

I just did a quick count of the O’Reilly titles on my shelves: I have fifty five.

And you?

literary arbitrage

Wednesday, June 20th, 2007

The two books I just bought on cost me $37.74, plus shipping to Spain of $13.47, for a total of $51.21.

The same books are available on for a total of £28.35, plus shipping to Spain of £5.97 and VAT of £1.37 for a grand total of £35.69 or USD $71.15.

So you can pay $51 to have the books shipped (in theory) from the US, or pay roughly 40% more and have them shipped (in theory) from the UK. The difference in shipping time isn’t much either, in practice. Even if the price of mailing in the UK were free and there were no VAT, it would still be cheaper to have books sent from the US.

The dollar hit a 26-year low against the pound in April of this year (2007). If it keeps falling and Amazon don’t adjust their pricing, I might start a side business in literary arbitrage.

better together

Wednesday, June 20th, 2007

Amazon, intentionally or not, have done a great job with their special offer feature that suggests a second book to you and offers you both at the same time for a discount.

One could argue that it’s not in their interests to offer you a second book that you would buy later anyway at its normal price. (Yes, you can argue that it’s implicitly in their interest because it creates goodwill.)

At least in this customer’s experience, they do a great job of offering me things that I might want but never offering me anything I already know that I want. You might think that that’s because I always immediately buy everything I want, but that’s not true.

Today they slipped up and offered me something I knew in advance that I also wanted. I went to look at Glut: Mastering Information Through the Ages, and after I clicked to see the book, I wondered if they might just maybe offer me Everything Is Miscellaneous: The Power of the New Digital Disorder. And… they did.

That’s a first for me. I buy lots of books on Amazon, and I’ve never been offered something I knew I wanted.

Of course it’s also in their interests to occasionally slip up like this. Then people write blog posts praising them and saying how good their algorithms are.

At least for me, Amazon’s “better together” is almost pitch perfect. They consistently land tempting titles just outside the small ring of books I’ve already decided I’m going to buy at some later point. (Note that making special offers like this is very different from the far simpler “customers who bought X also bought Y” – which is just a lookup.) It’s easy to imagine Amazon’s algorithms trying to figure out what I’m almost certainly going to buy anyway, and what I might well buy but probably wont, and picking something tantalizing and just over the edge, just out of reach. What a great way to push readers’ boundaries while making more sales and not leaving money on the table.

Whatever’s going on, and whatever you think might be going on, it’s clearly not simple to keep customers happy and enthusiastic via special offers that do not sacrifice money the customer would in fact spend anyway.

Pondering the T&C of Amazon’s S3 and EC2

Tuesday, June 19th, 2007

I’ve spent many hours reading about Amazon’s S3 and EC2 services since they were announced. They’re certainly very attractive, and they are being put to heavy use by many companies. There’s a list of examples over on O’Reilly Radar. Don MacAskill of SmugMug gave a great talk at ETech about SmugMug’s use of S3. SmugMug have something like 200TB in storage at S3.

I think S3 and EC2 are fantastic and innovative offerings from Amazon. I’d love to use them for my own project.

But if you read the Web Services Licensing Agreement, it’s quite worrying. Or at least it should be worrying for anyone whose potentially S3/EC2-reliant service may one day rub Amazon the wrong way.

Here are a few extracts:

5. You agree to provide such additional information and/or other materials related to your Application as reasonably requested by us or our affiliates to verify your compliance with this Agreement.

What does “other materials” include? Source code?

If your Application is available as an online solution, you acknowledge and agree that we (and/or our affiliates) may crawl or otherwise monitor your Application for the purpose of verifying your compliance with this Agreement, and that you will not seek to block or otherwise interfere with such crawling or monitoring (and that we and/or our affiliates may use technical means to overcome any methods used on your Application to block or interfere with such crawling or monitoring).

“Otherwise monitor” is pretty creepy and all-encompassing. I’m supposed to give Amazon blanket permission to monitor my service in any way they choose? I think it’s fair enough for them to reserve the right and means to verify that I’m in accordance with the agreed T&C, but the above language is…. well, see below.

If your Application is a desktop solution, you agree to furnish a copy of your Application upon request for the purpose of verifying your compliance with this Agreement.

What does this mean? Source code?

And then we get to the real kicker:

8) If your Application is determined (for any reason or no reason at all, in our sole discretion) to be unsuitable for Amazon Web Services, we may suspend your access to Amazon Web Services or terminate this Agreement at any time, without notice.


But big net-and-web-friendly Amazon, they wouldn’t just pull the plug on something they didn’t like. Would they? The experience of Zlio might make you wonder, as might the experience of Alexaholic Statsaholic.

From what little I know of those two cases, I don’t see a reason to condemn Amazon. But they do give pause, and section 8 of the T&C is frightening. There’s more in the agreement that I find vague (just what is an Amazon Property?), but that’s enough examples for now.

IANAL, but I’ve worked on and negotiated dozens of contracts. What we have here is a contract for services drawn up by the lawyers of just one party. This is the kind of shot across the bows you can take when your side gets to draft the contract, and it inevitably comes back with Unacceptable or Rejected all over the place, especially when you’ve egregriously over-reached. You know you’re over-reaching, of course. You get to frame the terms of the contract, which is why it’s so nice to do the first draft.

And yes, OK, Amazon is offering a service, they can define the price and the T&C as they see fit, and you can like it or lump it. But there’s another way, which is to push back a little.

S3 and EC2, and most likely future Amazon offerings, are important. They change a lot and they deserve to be widely used. It’s worth fighting about because they’re so great, because the T&C could be fixed, and because drafters of contractual terms like these expect you to push back.

Potential customers shouldn’t have to worry that Amazon might cut them off without warning and without reason. We should instead speak up and push for a better deal. Because right now the terms of the deal are totally one-sided. Amazon are big enough and mature enough and smart enough to know that it’s in their interests to make S3, EC2 and the rest of their web services as big as possible, and of course they know that their T&C are over-reaching.

If you’re building something that Amazon may one day decide they don’t like, or that they want to compete with, I’d be careful about using S3 or EC2. What if Amazon come along one day and offer to buy you for a deliberately lowball price—or else? What if [insert evil villain] calls up Jeff Bezos one day and makes a deal to have your service cut off? That’s going to be totally opaque to you, and you have no recourse. What if Amazon is bought by XXX, who then decide to cut you off? This may all sound farfetched, but these sorts of things do happen.

Comment #2 on the Zlio RW/W page I referenced above makes an important point. Amazon’s platform is akin to an operating system on which services can be built. Amazon promotes it like a platform. But they reserve the right to dump you unceremoniously, without notice, and without reason. Come on Amazon! We may be fragile startups dying to use your services, but we’re not idiots. If you want to build a platform and have people use it, do it properly. Otherwise, you’re just reserving the right to act like Microsoft after they finally woke up and realized that they could write applications for their OS too, and proceeded to use ugly means to wipe out competitors – to their ongoing and deserved detriment. But even Microsoft didn’t have an EULA that said they could take the OS away from you any time they felt like it.

Given a choice between Amazon cutting the price on S3 again and having them revise their T&C, I’d much rather the latter. But if we all silently accept their T&C, there’s no reason for them to revisit.

A few small changes could make Amazon’s web services irresistible.

Sort uniq sort revisited, in modern Python

Sunday, June 17th, 2007

Just after I started messing around with Python, my friend Nelson posted about writing some simple Python to speed up the UNIX sort | uniq -c | sort -nr idiom.

I played with it a bit trying to speed it up, and wrote several versions in Python and Perl. This was actually just my second Python program.

The other night I was re-reading some newer Python (2.5) docs and decided to try applying the latest and greatest Python tools to the problem. I came up with this:

from sys import stdin
from operator import itemgetter
from collections import defaultdict

total = 0
data = defaultdict(int)
freqCache = {}

for line in stdin:
    data[line] += 1
    total += 1

for line, count in sorted(data.iteritems(), key=itemgetter(1), reverse=True):
    frac = freqCache.setdefault(count, float(count) / total)
    print "%7d %f %s" % (count, frac, line),

In trying out various options, I found that defaultdict(int) is hard to beat, though using defaultdict with an inline lambda: 0 or a simple def x(): return 0 are competitive.

In the solution I sent to Nelson, I simply made a list of the data keys and sorted it, passing lambda a, b: -cmp(data[a], data[b]) as a sort comparator. Nelson pointed out that this was a newbie error, as it stops Python from taking full advantage of its blazingly fast internal sort algorithm. But…. overall the code was quite a bit faster than Nelson’s approach which sorted a list of tuples.

So this time round I was pretty sure I’d see a good improvement. The code above just sorts on the counts, and it lets sort use its own internal comparator. Plus it just runs through the data dictionary once to sort and pull out all results – no need to fish into data each time around the print loop. So it seemed like the best of both worlds.

But, this code turns out to be about 10% slower (on my small set of inputs, each of 200-300K lines) than the naive version which extracts data.keys, sorts it using the above lambda, and then digs back into data when printing the results.

It looks nice though.

reflective bandwagon

Thursday, June 14th, 2007

Here’s another thing I’ve had enough of: The graphic design bandwagon of which this image is a perfect example:


This technique is like a rash all over the web. It’s one thing to jump on the bandwagon and make your site look all cool and Web2.0-esque, but there’s another thing about these images that bugs me.

I don’t understand them.

There’s something about them that just doesn’t work for me. When I look at an image like the above, it somehow doesn’t sit right in my mind. I mean, where’s the light coming from? That’s not a shadow, it’s a reflection. It’s bouncing off that nice shiny black highly-reflective surface. So I guess the solution is that there is a bright light somewhere behind me and above my head. Is that it?

Images that have a shadow next to them or behind them are so much easier to deal with. But that was the bandwagon 10 years ago. Now we have the Web2.0 effect in full color, not boring gray. It’s romantic, it’s engaging, and it’s coming right at you, like, like, yes like a perfect reflection on a cool and glassy alpine lake.

And it’s….. everywhere.

it’s long

Thursday, June 14th, 2007

There are a few things that bug me on the internet.

One is that people often warn each other that articles are long, or apologize for writing long blog entries. There’s nothing inherently wrong with that. When it turns out though that these items are just a couple of screenfuls, you start to wonder what we’re all coming too. And yes, I know, it’s the 21st century, we’re all living at internet speed now, who’s got the time, etc.

OTOH, a word like “long” can be used to convey information. You can look at the word “long” and form some idea of just how long the long thing might be. And these days, it ain’t very long. Maybe we’re in the middle of a transition in which a word comes to mean its opposite.

Marc Andreessen recently began to blog, and the blogosphere is all abuzz. He writes tolerably well, and he’s got interesting comments on many things, but there’s a real down side: his posts are really long. Here’s a random example of someone who agrees.

That’s weird.

From where I sit, if someone writes well and is interesting or otherwise provocative, you wish they’d write more, not less. You want it to be long. Half a dozen web pages is not long. I read In Search of Lost Time last year. It took me 6 months and at 4300 pages or so, I think it qualifies as long. I’m reading Orwell’s letters, essays, and journalism. At 2200 pages, it seems fairly long too. I wished Proust was longer. I’ll probably wish Orwell was longer too. I tried reading The Decline and Fall of the Roman Empire (3500 pages), but the 7-volume “leatherette” set I bought stinks of old cigarette smoke and I couldn’t bear it.

How did we get from “long” meaning something like War and Peace (1100 pages) or Anna Karenin (850 pages) all the way to a 6-page (single narrow column) blog posting (with plenty of white space)?

What word should we now use for things that are longer than 6 pages or that require more than 5 minutes to read? Epic?

resorting to regular expressions

Wednesday, June 13th, 2007

I was going to write a much longer set of thoughts on moving to Python, but I don’t have time. Instead I’ll summarize by saying that I programmed for 28 years in various languages before switching to Python nearly 2 years ago.

I like Python. A lot. And there are multiple reasons, which I may go into another time.

One thing that has struck me as very interesting is my use of regular expressions. I came to Python after doing a lot of work in Perl (about 8 years). In Perl I used regular expressions all the time. And I mean every single day, many times a day. I like regular expressions. I understand pretty well how they work. I found multiple errors in the 2nd edition of Mastering Regular Expressions. I made a 20% speedup to version 4.72 of Grepmail with a trivial change to a regex. I put both GNU and Henry Spencer regex support into strsed. I use them in emacs lisp programming and in general day-to-day emacs usage, and in their limited form on the shell command line and in grep.

So given that regular expressions are so powerful, that I well know how to wield them, and that I did so perhaps ten thousand times during those 8 years of Perl, you might expect that I’d use them frequently in Python.

But that’s not the case.

In two years of writing Python almost every day, I think I’ve probably only used regular expressions about 10 times!

I’m not going to speculate now on why that might be the case. I’m writing this partly to see if others (in my huge circle of readers) have experienced something similar. I was prompted to write by an svn check in message of Daniel’s last night. He said:

You know things are bad when you find yourself resorting to regular expressions

And I knew exactly what he meant. When I find myself reaching for the Python pocket guide to refresh my memory on using Python regular expressions, it’s such an unusual event (especially given the contrast mentioned above) that I find myself wondering if maybe I’m doing something really inefficient and unPythonic.

Comments on Productivity and being Always-On

Monday, June 11th, 2007

Antonio over at the Onda has a post up about Productivity and being Always-On. He’s got comments turned off, so I’m going to make a few here.

First of all, I really enjoy Antonio’s writings. That’s why I read his blog. But today I just need to push back a little :-) I think all four of Antonio’s points about what you can expect to go wrong are rather weak and/or misleading.

Let’s go through them.

Power (this one was on me for being unprepared). Between Spain and England, I discovered 3 different plug types. What is more, if you travel with a laptop and a phone (more than one device to plug in) and check in late, good luck getting the hotels to have anything to lend you to plug your American appliances in.

You could substitute the U.S. for Spain or the UK in this sentence and it would remain true. There’s actually a good deal of standardized plug size across Europe. Yes, the UK and the US (and some other countries) do things differently. But Spain is part of a large swathe of countries that follow a standard. I could mention the use of 110 volt devices, but I wont. But I do suggest, just for fun, going to the reception of some US hotels and asking them if they have a European plug converter they could lend you. Or try asking for two. I’ve lived 10 years in the US and 10 years in Europe and I have a fairly strong opinion about where you’re more likely to find accommodating help for stuff that requires regular employees of a company to even be aware of the existence of other countries.

Consistent SMS/data on your cellphone. Having just switched to a GSM network, I was really excited by the prospect of 3G networks and zippy-fast mobile data. While voice worked everywhere, SMS and data did not. In fact, SMS was the flakiest of all of the services that I’ve come to rely on— I could receive messages almost everywhere, but I had at best 50% odds of being able to send them.

I’d put this down to (probably) having a mixture of Europe and US carriers involved. I also spent nearly 5 years working in the cell phone industry and know first hand from various carriers that passing SMS between their networks is (or was a few years back) hugely flaky. Someone from a US carrier (I don’t remember which), told me that, officially, US-Euro SMS was not supported by their network but that messages did sometimes “leak” through, but they weren’t sure how! In Spain I find SMS extremely reliable, and I send probably 200/month. When in the US I also have not-infrequent problems, in both directions.

And as far as the wi-fi is concerned, it does seem to be fairly ubiquitous, but in 100% of the cases it was expensive and encumbered by either its billing mechanism or by some lame proxy server setup that blocked most of the useful Internet services you’d want to get access to.

The same could easily be said of the US, and probably every other country. This is too general a complaint – I’ve encountered expensive brain-dead wifi all over the place. One pleasant exception is the airport at Las Vegas, with free wifi. Plus see below.

Overall Internet speed. Finally, the speed of “broadband” connections (especially in Spain) is painful. In this new world of rich Internet applications, it’s easy to forget that we’ve only just been able to get to the point where we can use them in the US and that this is far from a given for other parts of the world. For instance, in Spain was completely unusable, and even Gmail was severely hobbled by the dearth of bandwidth.

This is also very weak. Who was the ISP? In what city? What sort of bandwidth was the contract? How many different places, ISPs, did you try out? It’s like saying “I went to the US and my broadband connection sucked, so therefore broadband connections suck in the US”. FWIW, I’ve had an ADSL connection with a fixed IP address in Barcelona for about 7 years. I had the connection for several years, at a cost of about US$30/month during which the CEO of the company I worked for in Manhattan couldn’t even get any DSL connection to his Manhattan apartment. I mean nothing. He was using a modem for years while I had a much zippier always-on connection. These days I have a theoretical max of 1Mb up and 20Mb down, and the last time I tested it it was running at about 6Mb. A connection at that speed can be had from Ya for just US$26/month. I ssh into servers and the connections stay up until I close them (often many days). I can even work with Tabblo. I know dozens of people here who use GMail as their only mail source, and I’ve seen it working just fine, without noticeable delay.

That’s it for now I guess. While I’m sure Antonio’s experiences happened, they read like someone comparing their comfortable home setup with what they experienced as a foreign tourist. Of course those experiences will be very different, even if the underlying services are identical. You see the same thing when tourists complain about how expensive a country is. Yes, you can pay 12 euros (US$16!) for a large (and I mean beer stein large) Fanta on the Ramblas. But that says more about you than it does about Spain :-)

Orwell on T. S. Eliot and the path from existential angst to serial entrepreneur

Thursday, June 7th, 2007

I like George Orwell. A tired fool got me started on the four-volume collection of Orwell’s essays, journalism, and letters. It’s great. Among many things I could say, one is that you know you’re reading someone damned good if you’re fascinated by their thoughts on something you formerly had no interest or experience in. There’s the essay on Dickens that I mentioned earlier, essays on cheap vulgar postcards, boys magazines, and much else besides. Gore Vidal is similarly compelling, and I think I would take his collected essays even over those of Orwell. Christopher Hitchens is similarly provocative but not in the same class as a writer. Very few are.

Today I was reading an Orwell review of three T. S. Eliot poems. I’m not into Eliot and I’m not into poetry. Like Gore Vidal’s, Orwell’s reviews are wonderful – balanced and surgical skewerings. Anyway, I came across the following, which I enjoyed enormously and decided to post:

But the trouble is that conscious futility is something only for the young. One cannot go on ‘despairing of life’ into a ripe old age. One cannot go on and on being ‘decadent’, since decadence means falling and one can only be said to be falling if one is going to reach the bottom reasonably soon. Sooner or later one is obliged to adopt a positive attitude towards life and society. It would be putting it too crudely to say that every poet in our time must either die young, enter the Catholic Church, or join the Communist party, but in fact the escape from the consciousness of futility is along those general lines. There are other deaths besides physical death, and there are other sects and creeds besides the Catholic Church and the Communist Party, but it remains true that after a certain age one must either stop writing or dedicate oneself to some purpose not wholly aesthetic. Such a dedication necessarily means a break with the past:

every attempt
Is a wholly new start, and a different kind of failure

Because one has only learnt to get the better of words
For the thing one no longer has to say, or the way in which
One is no longer disposed to say it. And so each venture
Is a new beginning, a raid on the inarticulate
With shabby equipment always deteriorating
In the general mess of imprecision of feeling,
Undisciplined squads of emotion.

Apart from the fact that I am much too impatient to read poetry, one of my problems is that I never have any idea what it’s about. But at least the above is clear. It wonderfully captures the inevitable progression from the troubled search for meaning of existential youth to the amorphous struggles of the serial entrepreneur.