Fluidinfo

February 14, 2011

What is a writable API?

Filed under: APIs,Essence,Writable APIs — Terry Jones @ 9:46 am

When we released the Fluidinfo API for Boing Boing two weeks ago, Simon Willison noted on his blog:

“Fluidinfo really is a fascinating piece of software.” …. “Writable APIs are much less common than read-only APIs – Fluidinfo instantly provides both.”

If you search online to try to discover what people mean by a “writable API”, it’s hard to find anything that merits the name. So what did Simon mean? What is a writable API?

Both Simon and the team at Fluidinfo think “writable API” should be a kind of shorthand for an API that provides access to underlying data that is writable. This is not meant in the trivial already-possible sense wherein you pass data to an API method that stores them into a database you can’t otherwise access. We mean it in a more fundamental sense: that the underlying data is writable. That anyone or any application can directly access the data storage layer and add new information to it – without the knowledge of the people who stored the original data. That sounds pretty radical. But if you have a model of control in which objects are not owned but their pieces are, it’s not scary at all. In fact it’s liberating.

And, you guessed it, Fluidinfo has exactly that model of control. Any information stored into Fluidinfo instantly has a writable API in the sense just described. Let’s see a concrete example from the recent Boing Boing data imported into Fluidinfo.

Below is an illustration of an object in Fluidinfo, showing a subset of the tags that are on every Fluidinfo object representing a Boing Boing article. (The image was generated using Nick Radcliffe‘s fun About Tag image generator for Fluidinfo objects. Click the image to see the all its tags.)

An object

Simply by virtue of being stored in Fluidinfo, Boing Boing instantly got an API for all their articles. The API lets you find Boing Boing articles, as represented by objects in Fluidinfo, via querying on tags such as those shown on the object above. For example, you can use the API to find Boing Boing articles published in December 2008 that were written by Cory Doctorow. Or you can get a list of all the Boing Boing articles that contain a reference to the domain www.whitehouse.gov. (You can see details of these sorts of queries in our article on Mining the Boing Boing API.)

Those kinds of searches on Boing Boing data were not previously possible. We put the whole thing together in a single evening, which illustrates how simple it can be to make a Fluidinfo-fueled API for your own information. As cool as these examples are, though, they’re just reading & searching Boing Boing controlled data, as with a traditional API. What about writing?

Writing the Boing Boing data – without stopping to ask permission

The tags on the object above were put there by the Fluidinfo user named boingboing.net. That user controls those tags, and has given the rest of us read permission. But no-one owns the Fluidinfo object that the tags are on. As a result, anyone with a Fluidinfo account (sign up here) can add any information to the exact same object.

To give a very simple example, suppose someone wrote a simple browser extension (or extensions) that let Boing Boing readers mark stories as being funny or not suitable for work. Two users, Alice and Bob, might then put alice/funny and bob/nsfw tags onto the above object. Assuming I had read permission on those tags, I could then find Boing Boing articles by Cory Doctorow that Alice enjoyed and Bob found too risqué for work. Someone else could write a browser extension that popped up a warning about NSFW content based on Bob’s tag. In fact, take a proper look at the object above, you’ll see that I have added a terrycojones/nsfw tag to it (terrycojones is my username in Fluidinfo).

That’s customization and personalization – in our hands. It’s adding data to the exact same objects that Boing Boing created, combining their data and ours as we please, and all without stopping to ask permission or requiring that a database administrator or programmer anticipate our idiosyncratic needs. Boing Boing and any applications they create, may not be aware of, care about, or even be able to detect the new data (depending on permissions).

In other words, we can say that Boing Boing has a writable API, because other people and other applications are always free to add information to the same objects that the Boing Boing API is providing access to. The same applies to any application or API that uses Fluidinfo. A writable API opens the door onto a very different world, allowing unlimited possibilities for mash-ups, new applications, extensions, widgets, etc. It allows arbitrary customization and personalization. Fluidinfo acts like a universal metadata engine, providing guaranteed write access to anything, with a permissions system at the level of the tag, not the object.

We’ll give another example of a simple but fun writable API tomorrow. Next week we’ll release a much more substantial one at the LAUNCH conference in San Francisco. We’re really excited about it, and have a series of not-to-be-missed upcoming blog posts on what we’ve been up to.

Stay tuned!

February 12, 2011

Top data blogs information now in Fluidinfo, with an API

Filed under: Data,Howto — Terry Jones @ 6:12 pm

Image: Education Week

A bit over a month ago, Marshall Kirkpatrick of Read Write Web made lists of the Top 300 blogs about data and the Top 300 blogs about geo. As soon as I saw the lists, I added the data to Fluidinfo and emailed Marshall.

I added marshallk.com/top-blogs/data and marshallk.com/top-blogs/geo tags to the Fluidinto objects that correspond to the URLs in his lists (Fluidinfo has an object for everything; in each case I put the tags onto the logical object in Fluidinfo: the one object whose fluiddb/about value is the URL in question.)


You can then do things like this:

$ curl 'http://fluiddb.fluidinfo.com/values?query=marshallk.com/top-blogs/data%3c%3d10&tag=fluiddb/about&tag=marshallk.com/top-blogs/data' |
jsongrep.py results id '.*'
{u'fluiddb/about': {u'value': u'http://www.readwriteweb.com/cloud'},
 u'marshallk.com/top-blogs/data': {u'value': 3}}
{u'fluiddb/about': {u'value': u'http://cloud.gigaom.com'},
 u'marshallk.com/top-blogs/data': {u'value': 2}}
{u'fluiddb/about': {u'value': u'http://flowingdata.com'},
 u'marshallk.com/top-blogs/data': {u'value': 8}}
{u'fluiddb/about': {u'value': u'http://highscalability.com'},
 u'marshallk.com/top-blogs/data': {u'value': 7}}
{u'fluiddb/about': {u'value': u'http://www.calculatedriskblog.com'},
 u'marshallk.com/top-blogs/data': {u'value': 6}}
{u'fluiddb/about': {u'value': u'http://www.fivethirtyeight.com'},
 u'marshallk.com/top-blogs/data': {u'value': 5}}
{u'fluiddb/about': {u'value': u'http://www.guardian.co.uk/news/datablog'},
 u'marshallk.com/top-blogs/data': {u'value': 9}}
{u'fluiddb/about': {u'value': u'http://www.informationisbeautiful.net'},
 u'marshallk.com/top-blogs/data': {u'value': 10}}
{u'fluiddb/about': {u'value': u'http://www.zerohedge.com'},
 u'marshallk.com/top-blogs/data': {u'value': 1}}
{u'fluiddb/about': {u'value': u'http://freakonomics.com/blog'},
 u'marshallk.com/top-blogs/data': {u'value': 4}}

Those are the top 10 on Marshall’s data list (unsorted, obviously). I’ve cleaned up the output using my jsongrep.py program described and available here.

More interestingly, you can see if any sites are on both of Marshall’s lists:

$ curl 'http://fluiddb.fluidinfo.com/values?query=has%20marshallk.com/top-blogs/data%20and%20has%20marshallk.com/top-blogs/geo&tag=fluiddb/about'
{"results": {"id": {"a2e56723-453a-44e5-bd91-5576d0615c8e": {"fluiddb/about": {"value": "http://blog.simplegeo.com"}}}}}

Just a single blog is in both lists: http://blog.simplegeo.com.

So far, so good.

About half an hour ago, I saw a tweet from Daniel Tunkelang (the mind behind TunkRank) saying that eCairn have just released some work based on Marshall’s data, producing a list of 500 top data blogs! Cool.

So I’ve just imported that data to Fluidinfo too, adding a ecairn.com/top-data-blogs tag to the object for each URL on their list. The value of each tag, as with Marshall’s data, is the ranking on the eCairn list.

Let’s see how many blogs are on both lists:

curl 'http://fluiddb.fluidinfo.com/values?query=has%20marshallk.com/top-blogs/data%20and%20has%20ecairn.com/top-data-blogs&tag=fluiddb/about' | jsongrep.py results id '.*' | wc -l
117

Not as many as I expected. But there are some small differences in the URLs used, for example Marshall’s list had http://kaushik.net/avinash whereas the eCairn list has http://www.kaushik.net/avinash. This would be easy to clean up, and of course it’s also possible just to tag the object for both URLs in Fluidinfo.

You can do the Fluidinfo query has marshallk.com/top-blogs/data except has ecairn.com/top-data-blogs to see the sites that Marshall has in his list but which do not appear in the eCairn list, such as Marshall’s #12, http://blog.sqlauthority.com. eCairn’s calculation might have put them in the lower 500 of their list of 1000 (the eCairn article only gives their top 500). There are plenty of other interesting queries too, but this post is long enough already.

So there you go, a fun bit of playing with more data blog data with Fluidinfo. One of these days we’ll even make it into one of these lists 🙂

Here’s the tiny bit of Python code I just wrote to add the data. It uses the Python FOM library for Fluidinfo written by Ali Afshar:

import sys
from fom.session import Fluid
fdb = Fluid()
fdb.login('ecairn.com', 'password')

urls = [i[:-1] for i in sys.stdin.readlines()]

for rank, url in enumerate(urls):
    fdb.about[url]['ecairn.com/top-data-blogs'].put(rank + 1)

February 10, 2011

What the Post-It Note Can Teach Us About Apps and Data

Filed under: Essence,Events — Terry Jones @ 10:42 pm

On Feb 9th I gave a talk the the NYC Ignite event, titled “What the Post-It Note Can Teach Us About Apps and Data.” Below are my 20 slides (21 if you could image credits). These were advanced automatically every 15 seconds during the 5-minute talk. I’ll post a link to the video once it’s up.

While the topic may not seem to have anything to do with Fluidinfo, there is a very close connection. I’ll write about that another time.

February 9, 2011

Mining the BoingBoing API

Filed under: Awesomeness,Howto,Programming — Nicholas Tollervey @ 4:54 pm

With all the BoingBoing data from the past ten years now in Fluidinfo the next question is “what can we do with it..?”. That’s what I’ll be answering in this technical how-to, so expect lots of code / examples!

I’ve organised the article into four parts:

  1. Basic Fluidinfo concepts
  2. How BoingBoing data is organised
  3. Minecraft (example data mining interactions with the API)
  4. Super-duper cool stuff (this is the best bit!)

Basic Fluidinfo Concepts

Understanding Fluidinfo involves four simple concepts:

  1. Objects represent things.
  2. Tags define objects’ attributes.
  3. Namespaces organise tags.
  4. Permissions apply to namespaces and tags.

How does this all fit together..? Objects are simply tagged with data. Put another way, tags associate a value with an object.

The other important concept to make clear is that nobody owns objects, there are no permissions associated with objects and objects last for ever. Although every object has a unique ID they are also usually identified by a globally unique and immutable “about” tag value. It’s used as you’d expect: to indicate what the object is supposed to be about. Finally, anyone can add data to any object (more on this later).

(er… that’s really all it is.)

Of course, since Fluidinfo is a data-store it is possible to do searches, link objects and store all sorts of different types of data (from primitive types like numbers, booleans and text to more opaque values such as images, video, sound and other binary data).

Oh yeah, interaction with the data is via a simple yet powerful REST API. There are plenty of client libraries in many different languages which allow you to work without worrying about the dirty implementation details.

How the BoingBoing data is organised in Fluidinfo

Each of the 64,000 BoingBoing articles is represented by a corresponding Fluidinfo object whose about tag value is the URL of the original post on boingboing.net. In the original XML dump, each post looked something like this:

    <row>
        <permalink>http://boingboing.net/2000/01/21/street-tech-reviews-.html</permalink>
        <created_on>2000-01-21 14:07:38</created_on>
        <basename>street_tech_reviews_</basename>
        <author>Mark Frauenfelder</author>
        <title>Street Tech Reviews and news</title>
        <body><A HREF="http://www.streettech.com/">Street Tech</A> Reviews and news for gadget-lovers and propeller heads of all stripes.</body>
        <body_more>NULL</body_more>
        <comment_count>0</comment_count>
        <categories>NULL</categories>
    </row>

I’ve done the simplest thing possible: created a top-level boingboing.net namespace in Fluidinfo under which all tags used to annotate BoingBoing data are defined. I’ve added tags to this namespace that map to the original XML elements: permalink, created_on, basename, author, title, body, body_more, comment_count and categories. The Fluidinfo objects representing BoingBoing posts have data associated with them using these tags. For example, the object representing the post described in the XML example above has a boingboing.net/title tag with the associated value: “Street Tech Reviews and news”.

Since I was also cleaning the raw XML I decided to extract / re-structure some of the data. This resulted in some additional tags: year, month, day, timestamp, links and domains. The function of the date related tags should be clear. The links and domains tags are interesting because I scraped all the anchor tags in the body and body_more fields and processed the href values. Obviously the links tag references a list of all the URLs referenced in an article and the domains tag references a related list containing just the domain names.

I did one final enhancement to the data dump. I extracted all the authors and categories and turned them into tags. When I imported the data I used these tags in the “delicious” way of tagging: simply by having such a tag (with no associated value) an object is associated with an author or category.

Here’s what an object representing a BoingBoing article looks like:

An object

Another interesting view on the data is to explore the BoingBoing tags and namespaces in the Fluidinfo Explorer (see the screen-shot on the right). In the Explorer, if you right-click on a tag and select “Open Object” you’ll see the object that represents the tag in the main area of the application. This object is itself tagged with useful information – such as a description (containing copyright information). Yeah, I know, it sounds odd but this makes meta-tagging possible.

In addition to creating Fluidinfo objects for all the BoingBoing articles I also created an object for every domain referenced by BoingBoing throughout the last ten years.

The about tag value for these domain objects is the domain name itself. For example, there is an object about the “bbc.co.uk” domain.

Each of these domain objects has been tagged with a list of all the BoingBoing articles that mention them. This is, I think, rather cool. To continue the example, the bbc.co.uk domain was referenced in 177 BoingBoing articles.

Minecraft (example data mining interactions with the API)

So here comes the cool how-to stuff…

Should you need to, use the existing documentation to read about the Fluidinfo API in super-painfully-precise-techno-vision. However, I’m going to present a quick guided tour in the form of a Python session using the fluiddb.py module (remember my advice to use one of the client libs). The advantage of using fluiddb.py is that it’s a very thin layer on top of the HTTP API so you get a feel for how various things work. The other advantage is that reading Python is like reading pseudo-code and is thus a great teaching tool.

In the following example I simply import the fluiddb module and ask it for information about my user (ntoll). The basic pattern for calling Fluidinfo is: fluiddb.call(“HTTP-VERB“, “PATH IN API“, OTHER OPTIONAL ARGS)

>>> import fluiddb # loads the module into the session
>>> headers, body = fluiddb.call('GET', '/users/ntoll') # 'GET' is the HTTP verb & '/users/ntoll' is the API path
>>> headers # contains the HTTP headers returned from Fluidinfo
{'cache-control': 'no-cache',
 'connection': 'keep-alive',
 'content-length': '76',
 'content-location': 'https://fluiddb.fluidinfo.com/users/ntoll',
 'content-type': 'application/json',
 'date': 'Tue, 08 Feb 2011 19:42:10 GMT',
 'server': 'nginx/0.7.65',
 'status': '200'}
>>> body # contains the actual result, in this case basic information about the user ntoll (me)
{u'id': u'a694f2d0-428e-4aaf-85d1-58e903f56b30',
 u'name': u'Nicholas Tollervey'}

Notice how the “content-location” in the headers tells you what the full URL of the API call is (this is interesting since fluiddb.py creates this automagically for you). The body (result) is a Python dict object that basically mirrors the JSON dict object Fluidinfo served up.

The following example grabs information about a specific object. Notice that I pass in the path to the Fluidinfo resource I’m GETting as a list. This ensures that the BoingBoing URL gets correctly percent encoded.

>>> headers, body = fluiddb.call('GET', ['about', 'http://boingboing.net/2000/01/21/street-tech-reviews-.html']) # get basic information about the object about "http://boingboing.net/2000/01/21/street-tech-reviews-.html"
>>> headers
{'cache-control': 'no-cache',
 'connection': 'keep-alive',
 'content-length': '455',
 'content-location': 'https://fluiddb.fluidinfo.com/about/http%3A%2F%2Fboingboing.net%2F2000%2F01%2F21%2Fstreet-tech-reviews-.html',
 'content-type': 'application/json',
 'date': 'Tue, 08 Feb 2011 19:45:27 GMT',
 'server': 'nginx/0.7.65',
 'status': '200'}
>>> body
{u'id': u'469257cf-2c33-4628-a97e-47166bae24fa',
 u'tagPaths': [u'boingboing.net/timestamp',
               u'fluiddb/about',
               u'boingboing.net/day',
               u'boingboing.net/month',
               u'boingboing.net/year',
               u'boingboing.net/authors/markfrauenfelder',
               u'boingboing.net/comment_count',
               u'boingboing.net/author',
               u'boingboing.net/basename',
               u'boingboing.net/body',
               u'boingboing.net/domains',
               u'boingboing.net/created_on',
               u'boingboing.net/permalink',
               u'boingboing.net/title',
               u'boingboing.net/links']}
>>>

Hopefully, the result speaks for itself: it contains the unique ID of the Fluidinfo object that is about the BoingBoing URL, and a list of the tags on that object. Getting the value of a specific tag is simple:

>>> headers, body = fluiddb.call('GET', '/objects/469257cf-2c33-4628-a97e-47166bae24fa/boingboing.net/title')
>>> body
u'Street Tech Reviews and news'

I simply appended the path to the tag onto the object’s unique ID (this also works with the about tag too as used in the prior example).

Returning tag values for a set of results that match a query is also easy. The equivalent of the following SQL-esque query:

SELECT title, categories, created_on FROM boingboing.net WHERE authors="markfrauenfelder" AND year=2010;

… is:

>>> headers, body = fluiddb.call('GET', '/values', tags=['boingboing.net/title', 'boingboing.net/created_on', 'boingboing.net/categories'], query="has boingboing.net/authors/markfrauenfelder and boingboing.net/year=2010")

A call is made to the “/values” endpoint with a list of tags whose values we want returned and a query to generate the result set. The query is written in Fluidinfo’s super-simple query language. The headers of the response look like this:

>>> headers
{'cache-control': 'no-cache',
 'connection': 'keep-alive',
 'content-length': '287328',
 'content-location': 'https://fluiddb.fluidinfo.com/values?query=has+boingboing.net%2Fauthors%2Fmarkfrauenfelder+and+boingboing.net%2Fyear%3D2010&tag=boingboing.net%2Ftitle&tag=boingboing.net%2Fcreated_on&tag=boingboing.net%2Fcategories',
 'content-type': 'application/json',
 'date': 'Wed, 09 Feb 2011 10:55:50 GMT',
 'server': 'nginx/0.7.65',
 'status': '200'}

The actual results are a JSON object (of which the following is only a fragment):

{
  "results": {
    "id": {
      "f2976562-eba6-47e4-94a1-b36ffe9a2ab1": {
        "boingboing.net/created_on": {
          "value": "2010-10-14 13:14:14"
        }, 
        "boingboing.net/categories": {
          "value": [
            "science", 
            "technology", 
            "art and design", 
            "design"
          ]
        }, 
        "boingboing.net/title": {
          "value": "TED releases iPad app today"
        }
      }, 
      "627ebf2e-e38d-41da-a709-16294b4ab6f2": {
        "boingboing.net/created_on": {
          "value": "2010-02-19 11:29:36"
        }, 
        "boingboing.net/categories": {
          "value": [
            "culture"
          ]
        }, 
        "boingboing.net/title": {
          "value": "Miniboss T-shirt in the Boing Boing Bazaar"
        }
      } // etc... for lots of results
    }
  }
}

Happily, fluiddb.py has converted it into the Python equivalent so we can find out some useful information and look at individual results.

>>> len(body['results']['id']) # how many results do we have..?
1214
>>> body['results']['id'].keys()[0] # what's the id of the first result..?
u'f2976562-eba6-47e4-94a1-b36ffe9a2ab1'
>>> body['results']['id']['f2976562-eba6-47e4-94a1-b36ffe9a2ab1'] # show the record for the first result...
{u'boingboing.net/categories': {u'value': [u'science',
                                           u'technology',
                                           u'art and design',
                                           u'design']},
 u'boingboing.net/created_on': {u'value': u'2010-10-14 13:14:14'},
 u'boingboing.net/title': {u'value': u'TED releases iPad app today'}}

Great! So you have all the tools you need to search and explore all the BoingBoing articles from the last ten years. That’s what a conventional data API provides.

However, Fluidinfo can do additional super-duper cool stuff..!

Super-duper cool stuff!

Fluidinfo is an openly writeable database where objects have value because they are annotated with data from different sources. That’s why anyone can tag any data to any object. Since you control who can use, read and control your namespaces and tags, you still maintain control of data and importantly create a mechanism for trust.

You can trust values annotated with tags from the boingboing.net namespace because only BoingBoing is allowed to create and edit anything under this namespace. Since BoingBoing has annotated objects with information about articles then it’s safe to assume the objects are about a BoingBoing articles.

Here’s the super-duper stuff: you can contribute data to these objects too.

How..?

I’m glad you asked… 🙂

First of all you’ll need an account on Fluidinfo. Once you’ve signed up you’ll be the proud owner of a top-level namespace with the same name as your username. Before you can add data to objects you’ll need to create some tags to achieve this:

>>> fluiddb.login('ntoll', 'top-secret-password') # change as appropriate
>>> newTag = {'name': 'tuba', 'description': 'Related to Tubas in some way so it must be awesome!', 'indexed': False})
>>> headers, body = fluiddb.call('POST', '/tags/ntoll', newTag) # create new tag in /ntoll namespace
>>> headers
{'cache-control': 'no-cache',
 'connection': 'keep-alive',
 'content-length': '104',
 'content-type': 'application/json',
 'date': 'Wed, 09 Feb 2011 13:08:52 GMT',
 'location': 'https://sandbox.fluidinfo.com/tags/ntoll/tuba',
 'server': 'nginx/0.7.65',
 'status': '201'}
>>> headers, body = fluiddb.call('GET', '/tags/ntoll/tuba', returnDescription=True)
>>> body
{u'description': u'Related to Tubas in some way so it must be awesome!',
 u'id': u'b03f6937-cebf-481d-a0eb-5fd355a8a602',
 u'indexed': False}

The new tag is given a name (“tuba”), description and an indication if it should be indexed. The “201” status that Fluidinfo returned confirms that the new tag was successfully created under the “ntoll” namespace.

In case you hadn’t guessed I like tubas! I’d like others to find other tuba related objects in Fluidinfo so I’ve decided I’ll attach this newly created tag to anything tuba-related, including BoingBoing posts. As it happens Fluidinfo helps me get a bunch of these posts with a search like this:

>>> headers, body = fluiddb.call('GET', '/values', tags=['boingboing.net/title', 'fluiddb/about',], query = 'fluiddb/about matches "tuba"')
>>> body
{
  "results": {
    "id": {
      "e6c108f4-bd10-4cd3-b7d5-ad549b988c28": {
        "fluiddb/about": {
          "value": "http://boingboing.net/2006/06/22/flaming-tuba-guy-dav.html"
        }, 
        "boingboing.net/title": {
          "value": "Flaming Tuba guy David Silverman on NBC Tonight Show 6/23"
        }
      }, 
      "0c006f04-0663-48d6-9f11-4e082e75eb51": {
        "fluiddb/about": {
          "value": "http://boingboing.net/2010/11/22/tuba-skinny-old-time.html"
        }, 
        "boingboing.net/title": {
          "value": "Tuba Skinny: Old timey blues and jazz street act from New Orleans"
        }
      }
    }
  }
}

I’ve simply queried for matches for the word “tuba” in the fluiddb/about tag. Now that I’ve got a couple of results I can tag them like so:

>>> for tubaItem in body['results']['id']:
...     header, body = fluiddb.call('PUT', '/objects/%s/ntoll/tuba' % tubaItem, "Umpah-tastical, man!")
...     print header['status']
'204'
'204'

Yay! I’ve added some information to a couple of objects about BoingBoing articles! Let’s just confirm this by asking Fluidinfo for all the objects tagged with ntoll/tuba:

>>> headers, body = fluiddb.call('GET', '/values', tags=['fluiddb/about', 'boingboing.net/title', 'ntoll/tuba', ], query="has ntoll/tuba")
>>> body
{
  "results": {
    "id": {
      "e6c108f4-bd10-4cd3-b7d5-ad549b988c28": {
        "ntoll/tuba": {
          "value": "Umpah-tastical, man!"
        }, 
        "fluiddb/about": {
          "value": "http://boingboing.net/2006/06/22/flaming-tuba-guy-dav.html"
        }, 
        "boingboing.net/title": {
          "value": "Flaming Tuba guy David Silverman on NBC Tonight Show 6/23"
        }
      }, 
      "0c006f04-0663-48d6-9f11-4e082e75eb51": {
        "ntoll/tuba": {
          "value": "Umpah-tastical, man!"
        }, 
        "fluiddb/about": {
          "value": "http://boingboing.net/2010/11/22/tuba-skinny-old-time.html"
        }, 
        "boingboing.net/title": {
          "value": "Tuba Skinny: Old timey blues and jazz street act from New Orleans"
        }
      }, 
      "024bf1b6-348d-4839-8700-cbb30d86fb97": {
        "ntoll/tuba": {
          "value-type": "image/jpg", 
          "size": 467947
        }, 
        "fluiddb/about": {
          "value": "CrossCountryTuba"
        }
      },  
      "a694f2d0-428e-4aaf-85d1-58e903f56b30": {
        "ntoll/tuba": {
          "value": "I play tuba!"
        }, 
        "fluiddb/about": {
          "value": "Object for the user named ntoll"
        }
      }
    }
  }
}

Oops, I forgot I’d already tagged a couple of non-BoingBoing objects with the ntoll/tuba tag: one whose about tag value is “CrossCountryTuba” and the other being the object that represents me in Fluidinfo.

Notice how the value for the ntoll/tuba tag on the object about “CrossCountryTuba” contains only metadata: the type of data stored by that tag on that particular object (image/jpg) and the size of the data (467947 bytes). Looks like it’s an image of some sort. Let’s get it and see:

>>> headers, body = fluiddb.call('GET', '/objects/024bf1b6-348d-4839-8700-cbb30d86fb97/ntoll/tuba')
>>> image = open('tuba.jpg', 'w')
>>> image.write(body)
>>> image.close()

And what does tuba.jpg contain..?

CrossCountryTuba

Cool! Fluidinfo stores any type of data so long as you supply the appropriate MIME type when you upload the data.

How did I get the data into Fluidinfo..?

>>> tuba = open('Desktop/tuba.jpg', 'r') # open the original image
>>> header, body = fluiddb.call('PUT', '/objects/024bf1b6-348d-4839-8700-cbb30d86fb97/ntoll/tuba', tuba.read(), mime='image/jpg') # notice how I specify the MIME type
>>> tuba.close()
>>> headers['status'] # check we got a 200 OK response
'200'
>>> header, body = fluiddb.call('PUT', '/objects/024bf1b6-348d-4839-8700-cbb30d86fb97/ntoll/attribution', 'Tuba photo source: http://www.flickr.com/photos/dust/3813581130 licensed under a CC-BY 2.0 license') # need to add attribution as per the license

Simple..!

Now we’ve covered a lot of ground, so let’s just consider where we’ve got to.

  • We have a consistent, simple and powerful API to play with.
  • We can retrieve values using a simple query language referencing data contributed from many different users.
  • We can contribute data ourselves in such a way that the data remains under our control.
  • We can put all our data in the right place. If I want to contribute something about a BoingBoing article I just tag it to the object representing the right BoingBoing article.
  • We can contribute all sorts of data be it searchable primitive values like numbers, text and booleans or opaque data such as images, audio or anything else for which you can specify a MIME type.

You’re armed with enough basic knowledge to both mine BoingBoing data and contribute to it too. In fact, if you look carefully you’ll find all sorts of interesting objects in Fluidinfo. Remember, to find out more about the API check out our technical documentation.

Dive in, have fun and we’re more than happy to answer questions.

Image credits: BoingBoing’s logo and font butchered with permission (thanks @mustardhamsters!), diagram generated by abouttag written by Nick Radcliffe and the “Cross Country Tuba image” is © 2009 Amanda M Hatfield under a Creative Commons license.

Marc Hedlund joins the Fluidinfo board

Filed under: Awesomeness,Essence,Happiness,Progress — Terry Jones @ 11:54 am

We’re really happy to announce that Marc Hedlund has joined the Fluidinfo board!

I’ve gotten to know Marc slowly over the last 10 years. We first met very briefly when he was CEO of the Popular Power, a San Francisco start-up. Nelson Minar (Marc’s co-founder) and Derek Smith, two of my close friends who are very close to Marc, were both working there. Nelson and Derek, as well as several others including Fluidinfo investor and advisor Tim O’Reilly have sky-high opinions of Marc. Hearing regular off-the-charts superlatives about Marc over the years always kept me interested to someday know him better.

Marc was present at my first ever (abysmal!) solo VC pitch for Fluidinfo, to the ill-fated Bryce Roberts and Mark Jacobson of OATV in early 2007. During the presentation, Marc interrupted to ask if he could take a photo of my slide titled “Revenue”. I think he wanted it as an example of how not to pitch a VC. I’ve never forgotten. He snapped the pic, resumed his seat, and told me to carry on 🙂

Marc has a ton of experience. He founded and led Lucas Online, the internet subsidiary of Lucasfilm, was director of engineering at Organic Online, and was also CTO at Webstorm. After Popular Power he was VP of Engineering at Sana Security, and then Entrepreneur in Residence at OATV, gaining intimate knowledge of the world of venture capital and interacting with hundreds of start-up companies. Marc then co-founded Wesabe where he was Chief Product Officer before becoming CEO. These days he’s Chief Product Office at Daylife in New York.

As you can probably imagine, we’re honored and excited to have Marc involved at Fluidinfo.

February 7, 2011

Wanted: a UI/UX virtuoso who wants to help change the world

Filed under: Essence,People — Terry Jones @ 12:27 pm

Image: brandon shigeta

Fluidinfo is an always-writable information layer designed to hold metadata of any type about anything. It has an information model simple enough for everyone to understand – as simple as using Post-it notes. We’ve reached the point in development where we want people as well as applications to be able to create and interact directly with information in Fluidinfo.

Our core and driving passions are based on thinking about how humans work with information: how we find, consume, create, remember, organize, and share. We’re looking for a UI/UX wizard who also thinks and cares deeply about these things, and who has experience and exquisite taste in building elegant and super low-friction interfaces to information. Someone who can tell us what Tufte got wrong, and why.

Most of our computational experiences take place in read-only environments, or ones in which we can only add information in ways that have been anticipated and approved. This read-only world makes working with information using a computer very unlike the way we work with information in the natural world. A read-only world inhibits ad hoc, creative, unanticipated uses of information. It inhibits personalization and customization and therefore effective filtering. It is wrong because it puts information ontology ahead of evolution, inhibiting the natural emergence of information communication conventions like @addressing and hashtags.

With Fluidinfo we’re trying to imagine and build a computational world in which we always have write permission. In which people are free to add information to anything – to personalize and customize, to filter and search on their information and combinations of information. A computational world in which we can collectively make information more valuable by storing it in context.

There’s a huge and challenging UX/UI component to this. We believe humans are actually very good at working with raw information. It’s applications that are confusing. Part of our challenge is to create an interface to anything (i.e., in a web of things sense) and to do so in as transparent a manner as possible.

We’ve been thinking about these ideas for years and have built Fluidinfo as a platform to support this kind of simple and always-writable information storage. Now we want to put a face on it and we’re looking for someone truly great to join us.

About Fluidinfo Inc.

Fluidinfo is an angel and VC backed start-up based in New York, which is where we want you. (Our development team is currently distributed.) We have a world-class set of investors, including: Betaworks, IA Ventures, RRE Ventures, Lerer Ventures, Chris Dixon and the Founder Collective, Esther Dyson, Tim O’Reilly, Joshua Schacter, Michael Parekh, and Andrew Rasiej.

We’ve been quietly working on architecture since our seed funding in 2010, but are now starting to show the world the kinds of things we want to enable. Just for a small taste, read about the writable API we recently built for BoingBoing in a single evening, or how to use Fluidinfo to put metadata onto tweets. To get a flavor of what Fluidinfo is aiming at, see Truly Social Data, Information. Naturally, Fluidinfo as a universal metadata engine, and Kaleidoscope: 10 takes on Fluidinfo.

We also have some exciting news coming up on Feb 14 at the O’Reilly Tools of Change conference in NY, so keep an eye out for that too.

About you

We’re not going to tell you in detail what skills you need, because you’ll hopefully know that better than we do.

As a guideline though, in terms of technology for UI, we’re mainly interested in things built using ubiquitous web tools and standards like HTML5 and Javascript. We’re less keen on Flash, and cannot stomach heavyweight proprietary UI platforms. Anything that requires a clunky download is a non-starter. We do everything with Linux and will be very happy if you know your way around that world too. Experience with mobile or desktop UI/UX will also be valuable.

The three most important things we’re looking for are, in order of importance: 1) brilliance in UX/UI thought and execution; 2) proficiency in building dynamic web content with Javascript / modern HTML etc; 3) graphic design skill or experience working with graphic design teams to convert working prototypes into beautiful products. 1 and 2 are much more important than 3.

Above all though, you need to be able to show us interactive interfaces you’ve built or designed, be passionate about UI/UX, and be able to talk convincingly and in depth about what makes things work and not work.

Hiring process

To apply, send email to jobs at fluidinfo dotcom and include:

  • An outline of why you’d like to join Fluidinfo
  • A CV
  • Pointers to your previous work
  • Names and contact details of at least two references

Hiring will involve initial telephone/skype interviews, to be followed by in-person interview(s).

February 4, 2011

Dropping FluidDB as a product name, in favor of Fluidinfo

Filed under: Essence — Terry Jones @ 8:39 am

As of today, we’re dropping FluidDB as a product name and will just use Fluidinfo (which is of course also the name of our company). We’ve obviously put a lot of energy into the FluidDB name so it feels bad to know that some of that will be squandered, and that we might create short-term confusion with the change. But the downsides of a bad name are both real and important, and it’s time to fix the mistake. I’ve run the name change idea past dozens of people over the last two months, with almost universal approval—some of it very enthusiastic. So from now on it’s Fluidinfo, and only Fluidinfo.

Here are the main reasons for the change:

  • Having two names instead of just one was a source of confusion to many people.
  • The term “database” has too much inappropriate baggage. The mindset around databases is that they are used to carefully hoard and protect one’s own information. Fluidinfo is about combining information, about putting it in the same place, about openly writable objects with a different model of control. It’s a completely different mindset. Programmers, especially, have expectations about databases being something they download and incorporate into their application to hold just their data.
  • We were never particularly interested in the NoSQL debate (see also this O’Reilly GMT interview). Being occasionally classed as Yet Another NoSQL Database was inaccurate and led people towards apples vs oranges comparisons and confusion. For example: How does Fluidinfo compare to Hadoop? (note the two problems here – until now Fluidinfo has only been a company name, and FluidDB does not compare to Hadoop in terms of storage).

We very reluctantly began using “db” with the only purpose being to try to help VCs understand what we were building. That was before ideas of “cloud computing” and data as a service became well understood. It was helpful back then, but is not helpful now. Fittingly, after living with the baggage for a long time, the straw that broke the camel’s back was a VC who told me he almost didn’t take our meeting because he assumed from the name that we were just another of those NoSQL databases. When he said: One way or another I’m going to get you to drop that name, I knew we’d gone full circle—an inappropriate name had finally been recognized as being detrimental by the kind of person the name was supposed to be helping.

The name change can already be seen on our funky website (desperately in need of a facelift) and in the Fluidinfo documentation. We’ve also switched our Twitter account to @fluidinfo and are now using the #fluidinfo channel on IRC.

We’ll soon be moving some web pages around behind the scenes to improve their URLs, but will make sure the old links still work. If you’re interested in technical details of the API, please feel free to ask questions below, to join us in the Fluidinfo users mailing list, or to drop by the #fluidinfo channel on irc.freenode.net.

And….. stay tuned, we have some exciting Fluidinfo news right around the corner.

January 27, 2011

How we made an API for BoingBoing in an evening

Filed under: Awesomeness,Essence,Howto,Programming,Progress — Nicholas Tollervey @ 9:05 am

Yesterday the folks over at boingboing.net posted eleven year’s worth of posts as a zipped up XML file. XML is good, but having a searchable database of posts is better. So I (ntoll) am in the process of importing all the data into Fluidinfo. 🙂

When finished, every post and author in the boingboing data dump will be represented by an object in Fluidinfo and tagged with useful information. The diagram below shows a representation of what a typical object about a boingboing.net post looks like:

Tags on an object representing a boingboing.net post.

The object (the red blob with a unique ID written inside it) has several tags attached to it (named “boingboing.net/author” and “boingboing.net/comment_count” for example) with associated values (“Mark Frauenfelder” and “53” respectively).

Furthermore, while I was cleaning/preparing the data for upload I made sure to extract every domain name and URL referenced in each post and annotate the publication date as computer friendly values rather than just a human readable date.

An instant win is the ability to query data. For example, you’ll be able to search for all posts that link to techcrunch.com written in 2010 by Cory Doctorow. This is how to write the query in Fluidinfo’s super simple query language:

boingboing.net/domains contains "techcrunch.com" and
boingboing.net/year = 2010 and
boingboing.net/author = "Cory Doctorow"

The result will depend on how you make the query, but let’s assume you’re using a /values based call in Fluidinfo’s REST api and you’ve asked for each post’s title, publication date and a list of domains mentioned. You’ll get back some JSON encoded data that looks something like this:

[
  "results" : {
        "id" : {
            "05eee31e-fbd1-43cc-9500-0469707a9bc3" : {
                "boingboing.net/title" : {
                    "value" : "This is a made up title for illustrative purposes"
                },
                "boingboing.net/created_on" : {
                    "value" : "2010-08-19 13:23:41"
                },
                "boingboing.net/domains" : {
                    "value": [ 
                        "techcrunch.com", 
                        "microsoft.com"
                    ]
                }
            },
            "0521e31e-fbd1-43cc-9500-046974569bc3" : {
               ... more results ...
            }
        }
    }
  }
]


api

Wait a minute..!?!? This is just as if boingboing.net had an API.

Actually, by importing the flat XML file into Fluidinfo they do have an API – for free! Because of Fluidinfo’s open nature anyone can now make use of boingboing’s data via a few simple and easy to construct RESTful calls to Fluidinfo.

But that’s not all..!

Fluidinfo isn’t just openly readable – it’s openly writeable too.

Huh..?

Any user of Fluidinfo can tag data to any object. For example, I control a couple of tags called “ntoll/rating” and “ntoll/comment” which I could attach to any of the objects representing boingboing.net posts. By tagging an object with associated values I’m indicating what I thought about the post.

Importantly, I know which object I want to tag because it has a special unique tag called “about” whose value is the URL to the boingboing.net post in question. Other people who want to add information about this post will know to use the same object as me because the about tag-value tells them, er, what the object is about.

This brings me to the killer point: accessing data from boingboing.net is good, but the facility to annotate, discover and re-use everyone’s data about boingboing.net posts is better. That’s why we sometimes say we’re trying to do to databases what Wikipedia did to encyclopaedias.

Users of Fluidinfo don’t have to retrieve information about boingboing.net posts by building queries using just boingboing.net tags. It’s possible to search using other people’s tags. For example, here’s how to search for posts where I’ve given it a relatively high rating and added a comment:

ntoll/rating > 6 and has ntoll/comment and
has boingboing.net/title

And users don’t have to just ask for boingboing.net related tag-values either. It’s possible to ask objects for all their tags that you have permission to see. For example, you could retrieve a matching post’s title, body, author and any comments I make about the post with the ntoll/comment tag.

I’m only scratching the surface here so I’ll follow up with another post soon with some example code and use cases. In the meantime, if you want to find out more feel free to get in touch with us. We’re more than happy to help.

If you’re a developer and want to play with the boingboing.net data you should take a read of my last post explaining how to explore Fluidinfo’s API with Python.

In case you were wondering, it really was only half an evening’s work to prepare the data and write the import script. 🙂

Note: The import is currently running but should be complete later this afternoon. Not all posts will be in Fluidinfo yet (so far we have everything up to the end of September 2008).

Image credits: Diagram generated by abouttag written by Nick Radcliffe and the “API Sign” is © 2006 ulybug under a Creative Commons license.

January 14, 2011

Exploring FluidDB with fluiddb.py

Filed under: Howto,Programming — Nicholas Tollervey @ 10:27 am

FluidDB.py is a Python module based upon work by Seo Sanghyeon. The module has been extracted, extended and unit-tests were added by me (Nicholas Tollervey).

This post leads you from signing up to FluidDB to executing commands and queries using fluiddb.py. It assumes you’re already familiar with the concepts behind FluidDB and that you’re a developer looking to experiment with the API. If you’re not familiar or need a refresher take a look at the following slides:

We’ll be using Python but don’t assume any familiarity with that language. So lets get started…

When you sign up for FluidDB the following steps happen:

In order to access the API with your new credentials you need to use basic HTTP authentication over SSL (i.e. the URI starts with “https”). Obviously, using the raw API with a browser or tools such as curl or wget isn’t practical for most people. Hence the need for fluiddb.py as a wrapper (there are lots of API wrappers for FluidDB in many different languages, check out our list of libraries on our website for more information).

Installing fluiddb.py is as simple as typing:

$ pip install -U fluiddb.py

or

$ easy_install fluiddb.py

or, if you want to install from source:

$ git clone https://github.com/ntoll/fluiddb.py.git
$ cd fluiddb.py
$ python setup.py install

Simply import fluiddb to get started. The following Python terminal session demonstrates what I mean:

$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import fluiddb

The fluiddb.instance variable indicates which instance of FluidDB the module is using (it defaults to the main instance – the sandbox can be used for the purposes of testing/experimentation). Make use of the fluiddb.MAIN and fluiddb.SANDBOX “constants” to change instance as shown below.

>>> fluiddb.SANDBOX
'https://sandbox.fluidinfo.com'
>>> fluiddb.instance = fluiddb.SANDBOX
>>> fluiddb.MAIN
'https://fluiddb.fluidinfo.com'
>>> fluiddb.instance = fluiddb.MAIN

Use the login and logout functions to, er, login and logout (what did you expect..?):

>>> fluiddb.login('username', 'password')
>>> fluiddb.logout()

The most important function provided by the fluiddb module is call(). You must supply at least the HTTP method and path as the first two arguments to a call to the REST API. For example, the following call gets information about Terry Jones:

>>> fluiddb.call('GET', '/users/terrycojones')
({'status': '200', 'content-length': '69', 'content-location': 'https://fluiddb.fluidinfo.com/users/terrycojones', 'server': 'nginx/0.7.65', 'connection': 'keep-alive', 'cache-control': 'no-cache', 'date': 'Fri, 14 Jan 2011 15:15:51 GMT', 'content-type': 'application/json'}, {u'name': u'Terry Jones', u'id': u'05eee31e-fbd1-43cc-9500-0469707a9bc3'})

Notice how call() returns a tuple containing two items:

  • The header dictionary
  • The content of the response (if there is any)

Often it is simply better to do the following:

>>> headers, content = fluiddb.call('GET', '/users/test')

So the response headers get put into the “headers” variable and the actual content of the response is found in the “content” variable.

It is also possible to send the path as a list of path elements:

>>> headers, content = fluiddb.call('GET', ['about','yes/no','test','foo'])

Which will ensure that each element is correctly percent encoded even if it includes problem characters like slash: ‘/’ (essential for using the “about” based API).

If the API involves sending json data to FluidDB simply send the appropriate Python dict object and fluiddb.py will “jsonify” it appropriately for you:

>>> headers, content = fluiddb.call('POST', '/objects', body={'about': 'an-example'})

If the body argument isn’t a Python dictionary then you must be HTTP PUTting a tag-value on an object. In which case, it’s possible to set the mime-type of the value passed in body:

>>> headers, content = fluiddb.call('PUT', '/about/an-example/test/foo', body='<html><body>Hello, World!</body></html>', mime='text/html')

If you’re PUTting a primitive value then the fluiddb.py will automatically provide the correct mime-type for you:

>>> headers, content = fluiddb.call('PUT', '/about/an-example/test/foo', 12345)

To send URI arguments simply append them as arguments to the call() method:

>>> headers, content = fluiddb.call('GET', '/permissions/namespaces/test', action='create')

The “action = ‘create'” argument will be turned into “?action=create” appended to the end of the URL sent to FluidDB.

Furthermore, if you want to send some custom headers to FluidDB (useful for testing purposes) then supply them as a dictionary via the custom_headers argument:

>>> headers, content = fluiddb.call('GET', '/users/test', custom_headers={'Origin': 'http://foo.com'})

Finally, should you be sending a query via the /values endpoint then you can supply the list of tags whose values you want returned via the tags argument. For example, the following call will return the about-tag value and the twitter screen name of those twitter users I have met in the real world:

>>> headers, content = fluiddb.call('GET', '/values', tags=['fluiddb/about', 'twitter.com/users/screen_name'], query='has ntoll/met')

If this walkthrough isn’t enough then check out the three screencasts below. I made them a few months ago so I’m demonstrating an older version of fluiddb.py but it’s pretty much unchanged (only the implementation details described in part two have changed a little).

Using FluidDB’s RESTful API with fluiddb.py (Part 1)

Using FluidDB’s RESTful API with fluiddb.py (Part 2)

Using FluidDB’s RESTful API with fluiddb.py (Part 3)

As always, if you have any question, encounter problems or simply want to give us feedback, get in touch!

January 10, 2011

Introducing the Fluidinfo Explorer

Filed under: Awesomeness,Howto — Nicholas Tollervey @ 9:09 am

Normally users will use applications that use Fluidinfo and are unaware that the application is using Fluidinfo. Programmers will use Fluidinfo through its API. So, what if you’re a non-programmer and not using an application and you just want to have a look around inside Fluidinfo? Pier-Andre Parent has written the Fluidinfo Explorer – a web-based “explorer” GUI. If you’re not a developer, this is probably your best way of starting to interact with Fluidinfo without having to get into all the nitty-gritty details of the API. It’s like the file-system explorer you find on Windows, Mac or Linux.

We’ve found the Explorer so useful that we’ve made it available via the explorer.fluidinfo.com name. The URL you visit in your browser is very important. The pattern is http://explorer.fluidinfo.com/INSTANCE/NAMESPACE where INSTANCE is either “fluidinfo” or “sandbox” and NAMESPACE is name of the user, organisation or application you’re interested in. For example, the following link will display my (ntoll’s) top-level namespace in the explorer: http://explorer.fluidinfo.com/fluidinfo/ntoll

The result will look something like the following:

Notice how the namespace / tag structure is displayed in a collapsable tree control on the left hand side. The main body of the user interface contains a helpful welcome message and at the top right hand side is a search box for queries written in Fluidinfo’s query language and the login button.

Right click a namespace or tag to view and update its attributes, to create and delete namespaces and tags, and to set permissions on them. Clicking on a tag in on the left hand side fills the main area of the UI with all the objects that it has tagged:

So far, so simple…

But what about exploring the tags on a specific object? Click on an objects object id in the result set to display a list of all the tags attached to it. Click “Load all tag values” to display the associated values. Notice how the explorer differentiates between primitive (numbers, booleans, strings etc..) and opaque (images, audio, binary files etc…) values – primitive values are displayed whereas the cells for opaque values contain a description of the type of value stored in Fluidinfo:

Click the “open” link next to each of the opaque tag value to trigger a pop-up with the opaque value presented therein. In this example the value is an image:

Finally, if you follow the “View visual representation” link a rather nice graphical representation of the object is presented to you:

These diagrams are automatically generated by yet another third party application created by Nicholas Radcliffe and hosted on Google’s AppEngine. Given an appropriate URL a rather cool image is generated, e.g., http://abouttag.appspot.com/id/butterfly/8c2860e1-0d3f-47aa-9064-8a682cea6154.

The great thing about the explorer is that it provides an intuitive and visual representation of how Fluidinfo is structured. Have fun exploring!

« Newer PostsOlder Posts »

Powered by WordPress