Archive for December, 2009

Putting metadata onto tweets with Fluidinfo

Tuesday, December 1st, 2009

novaVarious articles have recently discussed adding metadata to Twitter tweets – see the posts by Nova Spivack, Robert Scoble, and Dave Winer (who also suggests we need a programming language built into a Twitter client).

These are the sorts of things that Fluidinfo was designed to support, and you can do them today. If you want a password to start playing with the Fluidinfo API, send email to api at fluidinfo dot com and we’ll set you up.

In the meantime, here are some examples. I’m doing this at the iPython command line, using the Fluid Object Mapper (FOM) library, written by Ali Afshar. FOM provides a natural way to work with Fluidinfo objects, namespaces, tags, etc. But you could use any client-side software you like. The Fluidinfo API is just HTTP.

First, let’s get a connection to Fluidinfo:

from fom.session import Fluid

fdb = Fluid()
fdb.db.client.login(‘terrycojones’, ‘PASSWORD’)
fdb.bind()

That last line is a bit of internal FOM magic that makes interactive use simpler in what follows. Ignore it for now.

To put metadata onto a tweet, we’ll first ask Fluidinfo for the object that’s about a particular tweet. Let’s take the one in the image above by @novaspivack. That tweet has a URL of http://twitter.com/novaspivack/status/4999653280. We ask Fluidinfo to give us the object “about” that URL:

from fom.mapping import Object

o = Object()
o.create(‘http://twitter.com/novaspivack/status/4999653280′)
o.uid
>>> u‘ab7fa032-06df-45be-9bb2-859c18c4d342′

The argument in the o.create call is the value of the Fluidinfo about tag. If an object with that about tag already exists, Fluidinfo gives it to us. Otherwise, a new object with that about tag is created. As you can see, the object also has an identifier (o.uid). In case you’re not familiar with Python, the “u” printed in front of the id indicates that the value is a unicode string.

This is a first point of interest. We’ve just created a Fluidinfo object corresponding to an arbitrary tweet. We didn’t ask for permission, we just did it. It’s a bit like a wiki: you can ask a wiki for its page on anything, and if no such page exists, the wiki just makes you a new one. Fluidinfo does the same thing with its objects and about tag. If you want to think about Fluidinfo in all its generality, you should now consider that the about tag above could have been for any tweet, including tweets that don’t exist (or don’t yet exist), for any URL, in fact for any string. We also could have followed Nova’s suggestion and used an about value like "twitter.com/id=4999653280". But we’re getting ahead of ourselves.

Fluidinfo has a simple query language, so let’s quickly confirm that we can find this object with a search:

fdb.objects.get(‘fluiddb/about = "http://twitter.com/novaspivack/status/4999653280"’)
>>> (200, {u‘ids’: [u‘ab7fa032-06df-45be-9bb2-859c18c4d342′]})

The 200 is an HTTP OK status telling us the call succeeded, and you can see one object matched the search and that its id is as expected.

So how about some metadata? Let’s say I want to add a rating to the object. Here’s a bit of one-time setup. First I get my top-level namespace (which corresponds to my Fluidinfo user name). Then I create a new tag called rating in that namespace:

from fom.mapping import Namespace

ns = Namespace(‘terrycojones’)
ns.create_tag("rating", "A tag for Terry’s ratings.", False)

The False argument is telling Fluidinfo that I don’t want the tag to be indexed. Ignore that for now.

The magic of FOM lets us directly examine the tag using Python attributes. So you can get the tag and see its description like so:

rating = ns.tag(‘rating’)
rating.description
>>> u"A tag for Terry’s ratings."

At this point we have a new tag, or an abstract tag if you prefer, but we haven’t actually tagged any objects with it. So let’s tag the object we created above for Nova’s tweet:

o.set(‘terrycojones/rating’, 6)

That was pretty easy! The Fluidinfo object that’s about Nova’s tweet now has some metadata on it, a ‘terrycojones/rating’ tag, with a value of 6. Let’s make sure we can get that value back:

o.get(‘terrycojones/rating’)
>>> (6, None)

We get a 2-tuple whose second value is None when the tag’s value is a primitive Python type (in this case an integer).

Let’s do a couple of quick searches for objects with terrycojones/rating tags:

fdb.objects.get(‘terrycojones/rating = 6′)
>>> (200, {u‘ids’: [u‘ab7fa032-06df-45be-9bb2-859c18c4d342′]})

fdb.objects.get(‘terrycojones/rating > 4′)
>>> (200, {u‘ids’: [u‘ab7fa032-06df-45be-9bb2-859c18c4d342′]})

fdb.objects.get(‘has terrycojones/rating’)
>>> (200, {u‘ids’: [u‘ab7fa032-06df-45be-9bb2-859c18c4d342′]})

In each case just that one object is returned, as expected. Note that the last query just tests for the presence of the tag, irrespective of the tag’s value (if any).

So there you have it: arbitrary metadata on tweets, and with a query language to help find things.

But let’s press on and see how things get more interesting.

First of all, you may have noticed that I didn’t have to deal with permissions at all in the above. I was able to create the Fluidinfo object about Nova’s tweet and to tag it without asking permission. In Fluidinfo that’s always the case.

But there is a permissions system. Let’s log in as a different user and try a few things to see how it works. First, I’ll log in as njr another user whose password I happen to know:

fdb.db.client.login(‘njr’, ‘PASSWORD’)

The njr user is actually Nicholas Radcliffe who has written several great introductory articles about Fluidinfo over at About Tag.

Let’s try (as Nick) getting the terrycojones/rating tag from the object for Nova’s tweet:

o.get(‘terrycojones/rating’)
>>> (6, None)

That still works, so we can infer that the terrycojones/rating tag is readable by the njr user. Let’s log in as terrycojones again and have a look at the permissions:

fdb.db.client.login(‘terrycojones’, ‘PASSWORD’)
fdb.permissions.tag_values[‘terrycojones/rating’].get(‘read’)
>>> (200, {u‘exceptions’: [], u‘policy’: u‘open’}

We’ve asked Fluidinfo for read permissions on tag values for the tag terrycojones/rating. The result is a general policy (open), with exceptions (currently empty). Now I’ll put the njr user into the exceptions list, and confirm the result:

fdb.permissions.tag_values[‘terrycojones/rating’].put(‘read’, ‘open’, [‘njr’])
>>> (204, None)
fdb.permissions.tag_values[‘terrycojones/rating’].get(‘read’)
>>> (200, {u‘exceptions’: [u‘njr’], u‘policy’: u‘open’}

The 204 status above is just the HTTP way of telling us that the call succeeded and that the reply has no content (as expected).

Now let’s reconnect as njr and try getting the terrycojones/rating tag again:

fdb.db.client.login(‘njr’, ‘PASSWORD’)
o.get(‘terrycojones/rating’)
>>>

You can see we got nothing back. If FOM handled non-OK HTTP responses a little more carefully, you’d see that this request actually got a 401 (Permission Denied) status. Fluidinfo is now refusing to let njr read the tag.

Nick already has a rating tag, called njr/rating, so let’s go get it, make sure there’s not one already on our object, and then tag our object with it:

ns = Namespace(‘njr’)
rating = ns.tag(‘rating’)
o.get(‘njr/rating’)
o.set(‘njr/rating’, 4)
o.get(‘njr/rating’)
>>> (4, None)

Now things are getting interesting. We have tags from different users on the same object. That’s part of the point of Fluidinfo and its where the value comes from: putting information together allows you to do nice things, like query on it. After re-connecting as terrycojones, I can now do queries like this:

fdb.objects.get(‘terrycojones/rating > 5 and njr/rating > 3′)
>>> (200, {u‘ids’: [u‘ab7fa032-06df-45be-9bb2-859c18c4d342′]})

fdb.objects.get(‘terrycojones/rating > 5 and njr/rating < 3′)
>>> (200, {u‘ids’: []})

fdb.objects.get(‘has terrycojones/rating and njr/rating >= 4′)
>>> (200, {u‘ids’: [u‘ab7fa032-06df-45be-9bb2-859c18c4d342′]})

fdb.objects.get(‘has terrycojones/rating and has njr/rating’)
>>> (200, {u‘ids’: [u‘ab7fa032-06df-45be-9bb2-859c18c4d342′]})

fdb.objects.get(‘has terrycojones/rating except has njr/rating’)
>> (200, {u‘ids’: []})

There’s a lot more I could do too, like giving Nick permission to add terrycojones/rating tag to objects. By the way, Nick has written some nice articles about the Fluidinfo permissions model. See Permissions Worth Getting Excited About and The Permissions Sketch.

For a final look at metadata, let’s put something totally different onto our object:

ns = Namespace(‘terrycojones’)
page = ns.create_tag("page", "Terry’s page tag.", False)
o.set(‘terrycojones/page’, ‘<html><head><title>hi</title></head><body>Hello there!</body></html>’, ‘text/html’)

I’ve just made a new tag called terrycojones/page and tagged our object with it. What’s different here is that the value is a string, and I’m passing a MIME type with it. If I retrieve the value of the tag on the object, you’ll see the MIME type comes back too:

o.get(‘terrycojones/page’)
>>> (‘<html><head><title>hi</title></head><body>Hello there!</body></html>’,
 ‘text/html’)

and as you might hope, if you go get that tag from that object using a browser, the MIME type is returned in the HTTP Content-type header, so you end up with a real web page, with a predictable URL. Try clicking: http://fluiddb.fluidinfo.com/objects/ab7fa032-06df-45be-9bb2-859c18c4d342/terrycojones/page. We can do the same for any MIME type at all – including ones you invent for your own convenience.

So there you go. That’s metadata on tweets. With a permissions model, with a query language, with user identity, with the freedom to add anything you want, and with typed data. We don’t need a new programming language for doing this sort of thing. What we need is a better data architecture.

Fluidinfo was designed with exactly this kind of use in mind. And it’s not specific to Twitter or tweets, or anything in fact. So you can put metadata onto anything you like, search on it, continue to own/control your own data, combine it as you like, and get data in and out using a simple HTTP API.

This is all live. It’s up and running, you can do this today. I should also add that Fluidinfo is still an early alpha, and is not yet particularly fast. For more information on Fluidinfo, start with the high-level description and if you’re a programmer, read the API docs.

Next time I’ll show you how we’re putting metadata onto Twitter users, and how you can too, of course! I might also start to talk about Tickery, our upcoming Twitter query application.

If you like all this, please pass on this article. We’d love to get the word out about Fluidinfo. It’s a little difficult from Barcelona.