How we made an API for BoingBoing in an evening

January 27th, 2011 by Nicholas Tollervey. Filed under Awesomeness, Essence, Howto, Programming, Progress.

Yesterday the folks over at boingboing.net posted eleven year’s worth of posts as a zipped up XML file. XML is good, but having a searchable database of posts is better. So I (ntoll) am in the process of importing all the data into Fluidinfo. :-)

When finished, every post and author in the boingboing data dump will be represented by an object in Fluidinfo and tagged with useful information. The diagram below shows a representation of what a typical object about a boingboing.net post looks like:

Tags on an object representing a boingboing.net post.

The object (the red blob with a unique ID written inside it) has several tags attached to it (named “boingboing.net/author” and “boingboing.net/comment_count” for example) with associated values (“Mark Frauenfelder” and “53” respectively).

Furthermore, while I was cleaning/preparing the data for upload I made sure to extract every domain name and URL referenced in each post and annotate the publication date as computer friendly values rather than just a human readable date.

An instant win is the ability to query data. For example, you’ll be able to search for all posts that link to techcrunch.com written in 2010 by Cory Doctorow. This is how to write the query in Fluidinfo’s super simple query language:

boingboing.net/domains contains "techcrunch.com" and
boingboing.net/year = 2010 and
boingboing.net/author = "Cory Doctorow"

The result will depend on how you make the query, but let’s assume you’re using a /values based call in Fluidinfo’s REST api and you’ve asked for each post’s title, publication date and a list of domains mentioned. You’ll get back some JSON encoded data that looks something like this:

[
  "results" : {
        "id" : {
            "05eee31e-fbd1-43cc-9500-0469707a9bc3" : {
                "boingboing.net/title" : {
                    "value" : "This is a made up title for illustrative purposes"
                },
                "boingboing.net/created_on" : {
                    "value" : "2010-08-19 13:23:41"
                },
                "boingboing.net/domains" : {
                    "value": [
                        "techcrunch.com",
                        "microsoft.com"
                    ]
                }
            },
            "0521e31e-fbd1-43cc-9500-046974569bc3" : {
               … more results …
            }
        }
    }
  }
]
 


api

Wait a minute..!?!? This is just as if boingboing.net had an API.

Actually, by importing the flat XML file into Fluidinfo they do have an API – for free! Because of Fluidinfo’s open nature anyone can now make use of boingboing’s data via a few simple and easy to construct RESTful calls to Fluidinfo.

But that’s not all..!

Fluidinfo isn’t just openly readable – it’s openly writeable too.

Huh..?

Any user of Fluidinfo can tag data to any object. For example, I control a couple of tags called “ntoll/rating” and “ntoll/comment” which I could attach to any of the objects representing boingboing.net posts. By tagging an object with associated values I’m indicating what I thought about the post.

Importantly, I know which object I want to tag because it has a special unique tag called “about” whose value is the URL to the boingboing.net post in question. Other people who want to add information about this post will know to use the same object as me because the about tag-value tells them, er, what the object is about.

This brings me to the killer point: accessing data from boingboing.net is good, but the facility to annotate, discover and re-use everyone’s data about boingboing.net posts is better. That’s why we sometimes say we’re trying to do to databases what Wikipedia did to encyclopaedias.

Users of Fluidinfo don’t have to retrieve information about boingboing.net posts by building queries using just boingboing.net tags. It’s possible to search using other people’s tags. For example, here’s how to search for posts where I’ve given it a relatively high rating and added a comment:

ntoll/rating > 6 and has ntoll/comment and
has boingboing.net/title

And users don’t have to just ask for boingboing.net related tag-values either. It’s possible to ask objects for all their tags that you have permission to see. For example, you could retrieve a matching post’s title, body, author and any comments I make about the post with the ntoll/comment tag.

I’m only scratching the surface here so I’ll follow up with another post soon with some example code and use cases. In the meantime, if you want to find out more feel free to get in touch with us. We’re more than happy to help.

If you’re a developer and want to play with the boingboing.net data you should take a read of my last post explaining how to explore Fluidinfo’s API with Python.

In case you were wondering, it really was only half an evening’s work to prepare the data and write the import script. :-)

Note: The import is currently running but should be complete later this afternoon. Not all posts will be in Fluidinfo yet (so far we have everything up to the end of September 2008).

Image credits: Diagram generated by abouttag written by Nick Radcliffe and the “API Sign” is © 2006 ulybug under a Creative Commons license.