ReadWriteWeb ReadWriteAPI

February 23rd, 2011 by Nicholas Tollervey. Filed under APIs, Data, Writable APIs.

Over the weekend I scraped the 11300 or so articles in the ReadWriteWeb archive. These are a great source of technology news and analysis covering stories from 2003 to the present day. Rather than keep this to myself (and rather unsurprisingly) I imported the metadata about each article into Fluidinfo. Hey presto, another instant API emerges!

Here’s how it works. For each article in the ReadWriteWeb archive there is an object in Fluidinfo. Each object has a unique “about” tag-value: the URL of the article. Furthermore, each object is annotated with information using tags found under the top level namespace. Tags include title, extract, date, categories and so on. In other words, you might visualize each object something like this:

I’ve also created and annotated objects about each of the authors of ReadWriteWeb articles and tagged objects representing each website ever mentioned by ReadWriteWeb.

So, it’s now possible to use the API like this:

>>> import fluidinfo
>>> returnTags = ['', '', '', '', ]
>>> query = " = 2010 and = 5 and = 5"
>>> head, result ='GET', '/values', tags=returnTags, query=query)
>>> head['status']
>>> result
            {u'': {u'value': u'Sarah Perez'},
              u'': {u'value': u'May  5, 2010  7:24 AM'},
              u'': {u'value': u"Feel like hacking your phone today? If you've got about 10 minutes to spare, you can turn your iPhone into a Wi-Fi hotspot using a combination of the ..."},
              u'': {u'value': u'How To Turn Your iPhone into a Wi-Fi Hotspot'}},
        ... etc....

What’s just happened..? I used a client library ( to ask Fluidinfo to return the author name, publication date, title and an extract of all ReadWriteWeb articles published on the 5th May 2010.

Being able to search and extract data from an API is cool, especially since you get this by virtue of simply hosting your data in Fluidinfo. But this is ReadWriteWeb we’re talking about. Happily, Fluidinfo can accommodate.

>>> fluidinfo.login('ntoll', 'mysecretpassword') # change as appropriate
>>> headers, result ='PUT', ['about', '', 'ntoll', 'rating'], 10)
>>> headers
{'cache-control': 'no-cache',
 'connection': 'keep-alive',
 'content-type': 'text/html',
 'date': 'Wed, 23 Feb 2011 15:07:29 GMT',
 'server': 'nginx/0.7.65',
 'status': '204'}

The example above shows how I sign in and annotate the object “about” the article with a tag called ntoll/rating and an associated value of 10 (obviously I enjoyed this article). The HTTP 204 response status tells me the value was successfully tagged.

Let’s just pause here for a moment and consider what I’ve just been able to do. Because Fluidinfo is openly writable I’m able to annotate the objects about ReadWriteWeb articles with my own data. Since objects in Fluidinfo don’t have owners or permissions attached to them I didn’t have to ask ReadWriteWeb for permission to augment the data about the article in question. Furthermore, if I only want my buddies to see what my ratings are I can set the tag to be only visible to a specific group of people. In this way Fluidinfo remains openly writable yet I still retain ownership and control over my data.

We’ve seen “read” and “write”, but what about “web”..?

Well it turns out I can stretch this analogy even further. Because everyone is tagging the same objects (identified by their “about” tag values) the data is being linked by virtue of the context of the object. We’re starting to get a web of linked data (yeah, I know, bear with me on this one…).

Since I can search and retrieve using any of the tags for which I have “read” permission I can start to create really cool mash-ups of data like this:

>>> header, result ='GET', '/values', tags=['fluiddb/about', '', ''], query="has and has and has")
>>> header
{'cache-control': 'no-cache',
 'connection': 'keep-alive',
 'content-length': '23528',
 'content-location': '',
 'content-type': 'application/json',
 'date': 'Wed, 23 Feb 2011 15:24:36 GMT',
 'server': 'nginx/0.7.65',
 'status': '200'}
>>> len(result['results']['id'])
>>> for r in result['results']['id'].values():
...     print r['fluiddb/about']['value']

What..? I’ve just asked Fluidinfo for all the articles from BoingBoing and ReadWriteWeb about companies backed by Union Square Ventures that both BoingBoing and ReadWriteWeb have covered. It turns out there are four companies: Twitter, Etsy, Boxee and Meetup.

What do one of these results look like..?

    {u'value': [u'',
    {u'value': u''},
    {u'value':  [u'']}}

What was involved in making such a cool query possible..? Simply importing data into Fluidinfo.

I’ll say no more and let you ponder the implications of what I’ve just demonstrated…