How we made an API for BoingBoing in an evening

January 27, 2011

How we made an API for BoingBoing in an evening

Filed under: Awesomeness,Essence,Howto,Programming,Progress — Nicholas Tollervey @ 9:05 am

Yesterday the folks over at boingboing.net posted eleven year’s worth of posts as a zipped up XML file. XML is good, but having a searchable database of posts is better. So I (ntoll) am in the process of importing all the data into Fluidinfo. 🙂

When finished, every post and author in the boingboing data dump will be represented by an object in Fluidinfo and tagged with useful information. The diagram below shows a representation of what a typical object about a boingboing.net post looks like:

The object (the red blob with a unique ID written inside it) has several tags attached to it (named “boingboing.net/author” and “boingboing.net/comment_count” for example) with associated values (“Mark Frauenfelder” and “53” respectively).

Furthermore, while I was cleaning/preparing the data for upload I made sure to extract every domain name and URL referenced in each post and annotate the publication date as computer friendly values rather than just a human readable date.

An instant win is the ability to query data. For example, you’ll be able to search for all posts that link to techcrunch.com written in 2010 by Cory Doctorow. This is how to write the query in Fluidinfo’s super simple query language:

boingboing.net/domains contains "techcrunch.com" and boingboing.net/year = 2010 and boingboing.net/author = "Cory Doctorow"

The result will depend on how you make the query, but let’s assume you’re using a /values based call in Fluidinfo’s REST api and you’ve asked for each post’s title, publication date and a list of domains mentioned. You’ll get back some JSON encoded data that looks something like this:

[
  "results" : {
        "id" : {
            "05eee31e-fbd1-43cc-9500-0469707a9bc3" : {
                "boingboing.net/title" : {
                    "value" : "This is a made up title for illustrative purposes"
                },
                "boingboing.net/created_on" : {
                    "value" : "2010-08-19 13:23:41"
                },
                "boingboing.net/domains" : {
                    "value": [ 
                        "techcrunch.com", 
                        "microsoft.com"
                    ]
                }
            },
            "0521e31e-fbd1-43cc-9500-046974569bc3" : {
               ... more results ...
            }
        }
    }
  }
]

Wait a minute..!?!? This is just as if boingboing.net had an API.

Actually, by importing the flat XML file into Fluidinfo they do have an API – for free! Because of Fluidinfo’s open nature anyone can now make use of boingboing’s data via a few simple and easy to construct RESTful calls to Fluidinfo.

But that’s not all..!

Fluidinfo isn’t just openly readable – it’s openly writeable too.

Huh..?

Any user of Fluidinfo can tag data to any object. For example, I control a couple of tags called “ntoll/rating” and “ntoll/comment” which I could attach to any of the objects representing boingboing.net posts. By tagging an object with associated values I’m indicating what I thought about the post.

Importantly, I know which object I want to tag because it has a special unique tag called “about” whose value is the URL to the boingboing.net post in question. Other people who want to add information about this post will know to use the same object as me because the about tag-value tells them, er, what the object is about.

This brings me to the killer point: accessing data from boingboing.net is good, but the facility to annotate, discover and re-use everyone’s data about boingboing.net posts is better. That’s why we sometimes say we’re trying to do to databases what Wikipedia did to encyclopaedias.

Users of Fluidinfo don’t have to retrieve information about boingboing.net posts by building queries using just boingboing.net tags. It’s possible to search using other people’s tags. For example, here’s how to search for posts where I’ve given it a relatively high rating and added a comment:

ntoll/rating > 6 and has ntoll/comment and has boingboing.net/title

And users don’t have to just ask for boingboing.net related tag-values either. It’s possible to ask objects for all their tags that you have permission to see. For example, you could retrieve a matching post’s title, body, author and any comments I make about the post with the ntoll/comment tag.

I’m only scratching the surface here so I’ll follow up with another post soon with some example code and use cases. In the meantime, if you want to find out more feel free to get in touch with us. We’re more than happy to help.

If you’re a developer and want to play with the boingboing.net data you should take a read of my last post explaining how to explore Fluidinfo’s API with Python.

In case you were wondering, it really was only half an evening’s work to prepare the data and write the import script. 🙂

Note: The import is currently running but should be complete later this afternoon. Not all posts will be in Fluidinfo yet (so far we have everything up to the end of September 2008).

Comments (13)

13 Comments »

[…] This post was mentioned on Twitter by Nicholas Tollervey, Nick Radcliffe and John Chandler, David de Weerdt. David de Weerdt said: RT @ntoll: New blog post about #fluiddb: "How we made an API for BoingBoing in an evening" http://bit.ly/fxTG0K […]

Pingback by Tweets that mention FluidDB » Blog Archive » How we made an API for BoingBoing in an evening -- Topsy.com — January 27, 2011 @ 9:35 am
[…] Read more here Posted in Uncategorized , interesting, science, tech | No Comments » […]

Pingback by How we made an API for BoingBoing in an evening with FluidDB « Interesting Tech — January 27, 2011 @ 10:44 am
Amazing! Just wow.

Comment by beschizza — January 27, 2011 @ 4:12 pm
[…] As predicted, a reader has already created an API to access all of Boing Boing’s archives (overnight). Wow. Talk about getting great work, fast, […]

Pingback by 1% Blog » Why Boing Boing Gets It — January 27, 2011 @ 3:24 pm
[…] at FluidDB collected all the information in the XML file into their centralized database system. ntoll’s post on the FluidDB Boing Boing repository explains a little bit about the structure of their system and how to access it as an API for use in […]

Pingback by Publicly accessible and mutable Boing Boing API compiled overnight | Battery and Charger Forum — January 27, 2011 @ 5:01 pm
[…] Eleven years of Boing Boing posts available in [XML], [JSON] and via [FluidDB] […]

Pingback by Update on the Boing Boing post release for your weekend project | Battery and Charger Forum — January 28, 2011 @ 11:00 pm
Amazing. And you just made me have an idea for other project.

Comment by Criação de Sites — January 30, 2011 @ 5:06 pm
Great! Glad you like it. Come hang out in #fluidinfo if you feel like chatting or want help/advice with the API or one of the client-side libraries. We’d love to help.

Comment by terrycojones — January 30, 2011 @ 5:25 pm
🙂 Thanks!

Comment by terrycojones — January 30, 2011 @ 5:26 pm
[…] recent entry on Simon Willison’s blog, How we made an API for BoingBoing in an evening caught my […]

Pingback by Topic Mapping BoingBoing Data? « Another Word For It — February 8, 2011 @ 6:16 am
Hey guys,

This is some serious good stuff!

Made me wonder though – do you plan on importing more blogs into your DB, or is there an easy way for other people to do something like this? The reason I’m asking is that there are billions of blogs out there that have public archives.. yet any kind of data mining in those archives becomes a hell of a task because of all the different HTML structures involved, so basically there is no way to properly scrape the data off the sites. However, considering that the data IS public per se, it should be in the interest of both blog authors and the many content-based app developers out there to have some sort of structured access to such data like blog archives. (Hey, I’d be happy if anybody provided an archive of RSS feeds, but amazingly it appears that nobody has thought of something like that.)

To summarize, I think that you appear to be in a position to do a lot of good with approaches like this, if you’d be interested to push wherever it is necessary to make this happen 😉

Cheers and keep up the good work,
Martin

Comment by Martin Baku — February 9, 2011 @ 8:23 pm
[…] days ago, a well known online information warehouse called Fluidinfo (formerly known as FluiDB) has announced that they have created an API for BoingBoing, a popular mainstream blog. Now any common Web user […]

Pingback by Every Blog should have a right to an API! « Topify! — February 9, 2011 @ 3:50 pm
an openly writable API?
→”Fluidinfo can tag data to any object.”
+Image | Attached

Comment by courtneyBolton — August 8, 2011 @ 3:25 pm

RSS feed for comments on this post. TrackBack URL

Fluidinfo

January 27, 2011

How we made an API for BoingBoing in an evening

13 Comments »

Leave a comment