Posted Friday, January 4th, 2008 at 7:44 pm under books, representation, tech.

Tagging in the year 3000 (BC)

Jimmy Guterman recently called Marcel Proust an Alpha Geek and asked for thoughts on “what from 100 years ago might be the hot new technology of 2008?”

Here’s something about 5000 years older. As a bonus there’s a deep connection with what Fluidinfo is doing.

Alex Wright recently wrote GLUT: Mastering Information Through the Ages. The book is good. It’s a little dry in places, but in others it’s really excellent. I especially enjoyed the last 2 chapters, “The Web that Wasn’t” and “Memories of the Future”. GLUT has a non-trivial overlap with the even more excellent Everything is Miscellaneous by David Weinberger.

In chapter 4 of GLUT, “The Age of Alphabets”, Wright describes the rise of writing systems around 3000 BC as a means of recording commercial transactions. The details of the transactions were written onto a wet clay tablet, signed by the various parties, and then baked. Wright (p50) continues:

Once the tablet was baked, the scribe would then deposit it on a shelf or put it in a basket, with labels affixed to the outside to facilitate future search and retrieval.

There are two comments I want to make about this. One is a throwaway answer to Jimmy Guterman’s request, but the other deserves consideration.

Firstly, this is tagging. Note that the tags are attached after the data is put onto the clay tablet and it is baked. This temporal distinction is important – it’s not like other mentions of metadata or tagging given by Wright (e.g., see p51 and p76). Tags could presumably have different shapes or colors, and be removed, added to, etc. Tags can be attached to objects you don’t own – like using a database to put tags on a physically distant web page you don’t own. No-one has to anticipate all the tag types, or the uses they might be put to. If a Sumerian scribe decided to tag the best agrarian deals of 3000 BC or all deals involving goats, he/she could have done it just as naturally as we’d do it today.

Secondly, I find it very interesting to consider the location of information here and in other systems. The tags that scribes were putting on tablets in 3000 BC were stored with the tablets. They were physically attached to them. I think that’s right-headed. To my mind, the tag information belongs with the object that’s being tagged. In contrast, today’s online tagging systems put our tags in a physically separate location. They’re forced to do that because of the data architecture of the web. The tagging system itself, and the many people who may be tagging a remote web page, don’t own that page. They have no permission to alter it.

Let’s follow this thinking about the location of information a little further…

Later in GLUT, Wright touches on how the card catalog of libraries became separated from the main library content, the actual books. Libraries became so big and accumulated so many volumes that it was no longer feasible to store the metadata for each volume with the volume. So that information was collected and stored elsewhere.

This is important because the computational world we all inhabit has similarly been shaped by resource constraints. In our case the original constraints are long gone, but we continue to live in their shadow.

I’ll explain.

We all use file systems. These were designed many decades ago for a computing environment that no longer exists. Machines were slow. Core and disk memory was tiny. Fast indexing and retrieval algorithms had yet to be invented. Today, file content and file metadata are firmly separated. File data is in one place while file name, permissions, and other metadata are stored elsewhere. That division causes serious problems. The two systems need different access mechanisms. They need different search mechanisms.

Now would be a good time to ask yourself why it has traditionally been almost impossible to find a file based simultaneously on its name and its content.

Our file systems are like our libraries. They have a huge card catalog just inside the front door (at the start of the disk), and that’s where you go to look things up. If you want the actual content you go fetch it from the stacks. Wandering the stacks without consulting the catalog is a little like reading raw disk blocks at random (that can be fun btw).

But libraries and books are physical objects. They’re big and slow and heavy. They have ladders and elevators and are traversed by short-limbed humans with bad eyesight. Computers do not have these characteristics. By human standards, they are almost infinitely fast and their storage is cheap and effectively infinite. There’s no longer any reason for computers to separate data from metadata. In fact there’s no need for a distinction between the two. As David Weinberger put it, in the real world “everything is metadata”. So it should be in the computer world as well.

In other words, I think it is time to return to a more natural system of information storage. A little like the tagging we were doing in 3000 BC.

Several things will have to change if we’re to pull this off. And that, gentle reader, is what Fluidinfo is all about.

Stay tuned.