Archive for January 4th, 2008

Tagging in the year 3000 (BC)

Friday, January 4th, 2008

Jimmy Guterman recently called Marcel Proust an Alpha Geek and asked for thoughts on “what from 100 years ago might be the hot new technology of 2008?”

Here’s something about 5000 years older. As a bonus there’s a deep connection with what Fluidinfo is doing.

Alex Wright recently wrote GLUT: Mastering Information Through the Ages. The book is good. It’s a little dry in places, but in others it’s really excellent. I especially enjoyed the last 2 chapters, “The Web that Wasn’t” and “Memories of the Future”. GLUT has a non-trivial overlap with the even more excellent Everything is Miscellaneous by David Weinberger.

In chapter 4 of GLUT, “The Age of Alphabets”, Wright describes the rise of writing systems around 3000 BC as a means of recording commercial transactions. The details of the transactions were written onto a wet clay tablet, signed by the various parties, and then baked. Wright (p50) continues:

Once the tablet was baked, the scribe would then deposit it on a shelf or put it in a basket, with labels affixed to the outside to facilitate future search and retrieval.

There are two comments I want to make about this. One is a throwaway answer to Jimmy Guterman’s request, but the other deserves consideration.

Firstly, this is tagging. Note that the tags are attached after the data is put onto the clay tablet and it is baked. This temporal distinction is important – it’s not like other mentions of metadata or tagging given by Wright (e.g., see p51 and p76). Tags could presumably have different shapes or colors, and be removed, added to, etc. Tags can be attached to objects you don’t own – like using a database to put tags on a physically distant web page you don’t own. No-one has to anticipate all the tag types, or the uses they might be put to. If a Sumerian scribe decided to tag the best agrarian deals of 3000 BC or all deals involving goats, he/she could have done it just as naturally as we’d do it today.

Secondly, I find it very interesting to consider the location of information here and in other systems. The tags that scribes were putting on tablets in 3000 BC were stored with the tablets. They were physically attached to them. I think that’s right-headed. To my mind, the tag information belongs with the object that’s being tagged. In contrast, today’s online tagging systems put our tags in a physically separate location. They’re forced to do that because of the data architecture of the web. The tagging system itself, and the many people who may be tagging a remote web page, don’t own that page. They have no permission to alter it.

Let’s follow this thinking about the location of information a little further…

Later in GLUT, Wright touches on how the card catalog of libraries became separated from the main library content, the actual books. Libraries became so big and accumulated so many volumes that it was no longer feasible to store the metadata for each volume with the volume. So that information was collected and stored elsewhere.

This is important because the computational world we all inhabit has similarly been shaped by resource constraints. In our case the original constraints are long gone, but we continue to live in their shadow.

I’ll explain.

We all use file systems. These were designed many decades ago for a computing environment that no longer exists. Machines were slow. Core and disk memory was tiny. Fast indexing and retrieval algorithms had yet to be invented. Today, file content and file metadata are firmly separated. File data is in one place while file name, permissions, and other metadata are stored elsewhere. That division causes serious problems. The two systems need different access mechanisms. They need different search mechanisms.

Now would be a good time to ask yourself why it has traditionally been almost impossible to find a file based simultaneously on its name and its content.

Our file systems are like our libraries. They have a huge card catalog just inside the front door (at the start of the disk), and that’s where you go to look things up. If you want the actual content you go fetch it from the stacks. Wandering the stacks without consulting the catalog is a little like reading raw disk blocks at random (that can be fun btw).

But libraries and books are physical objects. They’re big and slow and heavy. They have ladders and elevators and are traversed by short-limbed humans with bad eyesight. Computers do not have these characteristics. By human standards, they are almost infinitely fast and their storage is cheap and effectively infinite. There’s no longer any reason for computers to separate data from metadata. In fact there’s no need for a distinction between the two. As David Weinberger put it, in the real world “everything is metadata”. So it should be in the computer world as well.

In other words, I think it is time to return to a more natural system of information storage. A little like the tagging we were doing in 3000 BC.

Several things will have to change if we’re to pull this off. And that, gentle reader, is what Fluidinfo is all about.

Stay tuned.

Both my kids beat me at Connect 4

Friday, January 4th, 2008

My 2 older kids got Connect 4 for xmas.

I’ve liked Connect 4 for a long time. The first TCP/IP socket programming I ever did was in 1987 and it was code to let two people on the net play Connect 4 against each other, with graphics done using curses code written with Andrew Hensel. Later I wrote a machine opponent that used some form of Alpha-beta pruning and which was popular among a few CS grad students at the University of Waterloo. Amazingly, you can still find traces of my youthful code (and function names!) online. I like/d to think I am/was a pretty good player.

So you can imagine my confidence as I walked into the kid’s room and asked them who wanted to be beaten at Connect 4 by the champion of the world. My friend Russell has a take-no-prisoners attitude towards playing games with his kids. He wouldn’t dream of deliberately letting them win at anything. I let mine win very often, and find it hard to imagine how you could teach a small kid to play (say) chess if you don’t give them a chance. Anyway, tonight I decided I was going to show no mercy and whip them repeatedly at Connect 4.

I was so wrong.

At xmas just a couple of weeks ago I remember explaining the game to Sofia (8), and thinking what a vast gap existed between her understanding of the game and mine. Of course she quickly got the idea, but she had no idea at all of strategy. Lucas (6) came up during the explanation and of course had to be included, which meant an even more painstaking explanation from the champion of the world to his tabula rasa midgets.

Yesterday Ana told me that the kids, Sofia especially, were getting quite good. I smiled a knowing smile, and inside I scoffed.

Tonight I played Sofia in the first game and won fairly quickly. I told them we were going to play winner stays on, and so I then faced Lucas.

And the little bugger beat me. Fair and square he got me good, knew exactly what he was doing, and celebrated like a wild animal as he dropped the winning piece, while I sat there in shock with a huge smile on my face.

When I finally got back into the game I was up against Sofia. She proceeded to beat me too.

Amazing. Great. Funny. Alarming. How is this possible?

It reminds me of when I was about 12. My father was trying to figure out how to connect something with some cables. I took a look and told him what to do. I’ll never forget it. He knew I was right and he looked straight at me and said “how come you’re smarter than I am?” I guess I shrugged, but inside I was thinking “yep”.

Pride before a fall. Multiple falls. And you wouldn’t want it any other way, of course.

Still, they might have waited a few more years before mowing me down.

More email customization

Friday, January 4th, 2008

My recent email changes are working out well. Yesterday morning I woke up and didn’t read email. That’s because I didn’t have any email!

Well, I did, but procmail had filed it all into mail/incoming/IN-20080103.spool because none of it needed immediate attention. I have set VM up so that it knows to look for an x.spool file if I ask it to visit a file called x. That’s one line of elisp in VM: (setq vm-spool-file-suffixes (list ".spool")).

I like this setup because 1) it keeps my main inbox almost empty, 2) it keeps non-essential emails out of my face, and 3) it puts pressure on me to quickly deal with stuff that collects in the daily file, because I know that if I don’t it’s going to be forgotten.

And how to get to the daily file when I do decide to go look? Yes, another little piece of code:

  (define-key vm-mode-map "i"
    ‘(lambda ()
       (interactive)
       (vm-visit-folder
        (expand-file-name
         (concat "~/mail/incoming/IN-"
                 (format-time-string "%Y%m%d"))))))

which simultaneously defines a function to take me (in VM, in emacs) to today’s file and puts that function on the “i” key in VM. So I just hit a single key and I’m automatically looking at the non-time-critical mail file for the day. I’ll probably write a little function to take me to yesterday’s too.

And yes, I guess this is all highly personalized, but these are things that I do many times a day every day of my life. So I’m happy to streamline them. And all the code is trivial. That’s the most interesting thing. With a tiny bit of code you can do so much and without it you can only do what other programmers thought you might want or need to be able to do.