Archive for December, 2010

Delicious to FluidDB

Monday, December 20th, 2010

In case you’ve missed the brouhaha, Yahoo were rumoured to be shutting down the rather excellent delicious bookmark/tagging service. Since reading this post in the Washington Post and checking from the horses mouth it looks like the rumours are mistaken. Nevertheless a plethora of tools for grabbing and backing-up data from delicious have been posted “just in case”.

Since I (ntoll) have always wanted to use FluidDB (our openly writable shared database – sign up here) as a delicious clone, the rumours prompted me to quickly knock together a script to extract my bookmarks from delicious and store them in FluidDB. I’m not the only person to have thought of this: Fluidinfo advisor Nick Radcliffe described one method for achieving this aim last year. Over the weekend both Nick and I have been thinking hard about how to organise the imported data within FluidDB.

The result is a simple standalone Python script called delicious2fluid that does exactly what its name implies. The source code is hosted at Github and I’ve added it as a package on PyPI (the Python Package Index). The rest of this post explains how to use delicious2fluid then describes some of the benefits of using FluidDB (a flexible schema, simple yet powerful queries, values associated with tags etc).

There are two options for installation:

  1. Download the source code and run the installation script:
    $ git clone git://github.com/ntoll/delicious2fluid.git
    $ cd delicious2fluid
    $ python setup.py install
  2. Use PyPI with pip or easy_install:
    $ pip install -U delicious2fluid or
    $ easy_install delicious2fluid

Once installed you simply need to run the command and answer the questions:

$ delicious2fluid
Delicious username: ntoll
Delicious password:
FluidDB username: ntoll
FluidDB password:
FluidDB path (hit return to default to root namespace: ntoll)
2010-12-17 21:09:12,601 - d2f - INFO - Grabbing bookmarks from delicious
2010-12-17 21:09:29,223 - d2f - INFO - 200 OK
2010-12-17 21:09:29,492 - d2f - INFO - Creating delicious namespace in FluidDB
... etc ...

The username and password for both services are not stored in any way shape or form. As you might have guessed the script pipes a log of what it’s up to to stdout. If you do encounter any problems then the d2f.log file will contain lots of debug information (bug reports and suggestions most welcome!).

The script will ignore private bookmarks since we don’t want it to be responsible for leaking information but it will import all the tags you use even if they’re attached to private bookmarks. It’s important to note that the existence of tags in FluidDB is public since every tag has an associated object with an appropriate “about-tag” value that identifies it as an object about a specific tag (you have been warned!).

After grabbing an XML dump of your bookmarks from delicious the script creates the following tags in your root namespace (override the default location of the tags by providing a namespace path for the final question that the script asks you):

  • USERNAME/title
  • USERNAME/notes

Metadata from delicious is stored with tags created under the delicious namespace:

  • USERNAME/delicious/hash
  • USERNAME/delicious/time
  • USERNAME/delicious/meta
  • USERNAME/delicious/tag

FluidDB stores the tag names as a set of strings in the tag named USERNAME/delicious/tag. Furthermore, each tag you create in delicious will be recreated in FluidDB under your root namespace:

  • USERNAME/TAGNAME

Obviously, “USERNAME” is replaced with your username on FluidDB (i.e. your root namespace if you’ve not overridden the default location). These tags annotate objects representing bookmarks in FluidDB (one object per bookmark). The object’s about tag value is simply the URL that the bookmark references so everyone else can easily find and tag it.

For example, say I (ntoll) only used three tags (“foo”,“bar” and “baz”) then the following tags will be created in FluidDB:

  • ntoll/foo
  • ntoll/bar
  • ntoll/baz

These tags are automatically added to the correct objects to indicate how the original bookmark was tagged. Of course there is nothing to stop anyone from adding more tags and information or creating more objects to represent bookmarks that might not have originated from delicious.

I’ve succeeded in importing all my tags and bookmarks (it took a couple of hours for c2000 tags and 1800 bookmarks). If you’re interested, use the FluidDB Explorer to take a look at a user-friendly view of my delicious tags. Open the tree view on the left hand side and click on the tags to find the associated objects/bookmarks. You’ll also see the query used to generate the results (usually something along the lines of “has ntoll/delicious/tags/FOO”).

You’ll also notice that I’ve actually put all my tags in the ntoll/delicious/tags namespace and ignored the default “schema”. Why have I done this? Three reasons:

  1. It helps to indicate the origin of the data.
  2. It stops my root namespace from getting polluted with (potentially) thousands of tags.
  3. It indicates that all the tags under the “delicious” namespace are to be used just like in the delicious web-application.

But won’t that mean I’ve broken FluidDB since I’ve ignored the precedent set by Nick Radcliffe in the blog posts I mentioned earlier..?

Not at all! One of the strengths of FluidDB is that it works well across different or evolving schema. For example, I can still find interesting bookmarks with queries such as:

has njr/fluidinfo and has ntoll/delicious/tags/fluidinfo

Which leads me to an object with the id f3f80612-7015-4a61-a1ba-94087e9aa582 and fluiddb/about value of “http://paulerb.typepad.com/infosharing/2009/01/is-metadata.html” (a really great blog post, by the way). I’ve used Nick’s visualisation tool to create the following representation of the object:

If you’re eagle-eyed you’ll have spotted that I’ve also added an “ntoll/rating” tag to this object with an associated value of 10 (it’s at the bottom left hand side). This demonstrates several important aspects of FluidDB:

  • I’m not limited to using a pre-defined schema. I can annotate any object with any tag linking it to any type of data – be it a primitive (searchable) value like an integer or something more opaque like a PDF document (contrast this with delicious’s value-less tags).
  • It’s possible to ask the tag for it’s description, in which case this particular one will return “An indication of how I rate something”. Since I am the only person who could have created this tag you know “I” = ntoll.
  • Because the tag is openly readable you can use it in your queries. For example, you might want a list of all the delicious bookmarks to which I’ve tagged a high rating:
    has ntoll/delicious/description and ntoll/rating >7
    (In fact you could use any tag for which you have “read” permission no matter who created it.)

In conclusion, it’s early days for this script and whilst its original purpose as a backup for delicious’s demise is no longer relevant it has provided an opportunity to demonstrate some of the interesting ways in which the openly writable, social and evolutionary approach of FluidDB adds value to a collection of bookmarks.