Fluidinfo

March 21, 2011

Examples of Fluidinfo O’Reilly API queries

Filed under: APIs,Data,Howto,Programming — Nicholas Tollervey @ 9:40 am

This post is all about querying the O’Reilly book and author information recently imported into Fluidinfo. If you want the skinny on Fluidinfo’s query language in glorious in-depth techno-geek-speak then check out the documentation. If you’d rather see some real world examples, read on…

In Fluidinfo, objects represent things (and all objects have a unique id). Information is added to objects using tags. Tags can have values, and tag names are organized into namespaces that give them context. Permissions control who can see and use namespaces and tags.

Objects do not belong to anyone and don’t have permissions associated with them. They’re openly writable. Anyone can tag anything to any object. Many objects have a special globally unique “about” tag value that indicates what they are about. Interaction with Fluidinfo is via a REST API.

That’s Fluidinfo in a nutshell.

In another article published today I describe the Fluidinfo tags and namespaces used to annotate objects with O’Reilly data. The tags are attached to objects for O’Reilly books and authors. Both kinds of objects have about tags. So a trivial first kind of query is to go directly to an object that’s about a book. For example, to get information about the object representing the book “Open Government” visit the URL http://fluiddb.fluidinfo.com/about/book:open government (daniel lathrop; laurel ruma).

You’ll get back a JSON response containing a list of all the tags (that you have permission to read) attached to that object and the object’s globally unique id. Similarly, you can go directly to the object for an O’Reilly author http://fluiddb.fluidinfo.com/about/author:tim oreilly.

In case you’re wondering about the format of these book and author about tags, we used the abouttag library written by Nicholas Radcliffe to generate them. They’re designed to be readable, easy to generate programmatically, and unlikely to result in collisions. You don’t have to remember them though, as there are many other ways to get at objects, via querying, as we’re about to see.

Queries on tags and their values

Below are some examples of using Fluidinfo’s query language.

Presence

Return all the objects that have an O’Reilly title:

has oreilly.com/title

You can see the results at the following URL: http://fluiddb.fluidinfo.com/objects?query=has oreilly.com/title. Once again, the result is in JSON. It simply contains a list of the ids of matching objects (representing things that O’Reilly have tagged with a title).

That’s the equivalent of the following SQL statement:

SELECT id FROM oreilly.com WHERE title IS NOT NULL;

Caveat: There are no tables in Fluidinfo so it’s impossible to make a direct translation to SQL. This example and those that follow simply illustrate a conceptual equivalence to make it easier for those of you familiar with SQL to get your heads around the Fluidinfo query language.

Comparison

Return all the O’Reilly objects whose price is less than $40 (the price is stored in cents).

oreilly.com/price-us < 4000

Here it is as a URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/price-us < 4000

In SQL it would be:

SELECT id FROM oreilly.com WHERE price-us < 4000;

Text Matching

Return all the O'Reilly objects that have "Python" in the title.

oreilly.com/title matches "Python"

The resulting URL: https://fluiddb.fluidinfo.com/objects?query=oreilly.com/title matches "Python"

In SQL:

SELECT id FROM oreilly.com WHERE title LIKE '%Python%';

Set Contents

Return all the O'Reilly objects representing authors who were involved in writing the work with ISBN "9781565923607" (which is the unique ID O'Reilly use in their catalog). The value of oreilly.com/authors/works tags is always a set of unique ISBN numbers like this: ["9781565923607", "9781565563728", "9781627397284"].

oreilly.com/authors/works contains "9781565923607"

The resulting URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/authors/works contains "9781565923607"

In SQL:

SELECT id FROM oreilly.com/authors WHERE '9781565923607' in (SELECT works FROM oreilly.com/authors);

(Actually, the similar "IN" operation in SQL isn't a very good example since it results in verbose monstrosities like the above.)

Exclusion

Return all the O'Reilly books that were published in 2001 except those published in April.

oreilly.com/publication-year=2010 except oreilly.com/publication-month=4

The resulting URL: https://fluiddb.fluidinfo.com/objects?query=oreilly.com/publication-year=2010 except oreilly.com/publication-month=4

In SQL:

SELECT id FROM oreilly.com WHERE year=2010 and month<>4;

Logic

It's possible to use the and and or logical operations. For example, return all the O'Reilly books whose title matches "Python" and were published before 2005:

oreilly.com/title matches "Python" and oreilly.com/publication-year < 2005

The resulting URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/title matches "Python" and oreilly.com/publication-year < 2005

In SQL:

SELECT id FROM oreilly.com WHERE title LIKE '%Python%' and year < 2005

Grouping

Return all the objects representing O'Reilly books mentioning "Python" in their title that were published in either 2008 or 2010.

oreilly.com/title matches "Python" and (oreilly.com/publication-year=2008 or oreilly.com/publication-year=2010)

The resulting URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/title matches "Python" and (oreilly.com/publication-year=2008 or oreilly.com/publication-year=2010)

In SQL:

SELECT id FROM oreilly.com WHERE title LIKE '%Python%' AND (year = 2008 OR year = 2010);

Querying across different data sets

Fluidinfo can query seamlessly across tags from different sources that are stored on the same object. E.g., return the titles of all O'Reilly books that Terry Jones owns.

has oreilly.com/title and has terrycojones/owns

The resulting URL: http://fluiddb.fluidinfo.com/objects?query=has oreilly.com/title and has terrycojones/owns

In SQL:

Well, it's actually not clear how you'd do this in SQL. Presumably there'd need to be some kind of table join, supposing that were possible!

Getting back tags on objects matching a query

It's also possible to indicate which tag values to return for each matching object. This is done by using the Fluidinfo /values HTTP endpoint and specifying the tag values to return as arguments in the URL path. For example, if I wanted the title, author names and publication year of all the O'Reilly books with the word "Python" in the title published before 2006 then I'd use the following query:

oreilly.com/title matches "Python" and oreilly.com/publication-year < 2006

and append the wanted tags to the URL after the query (in any order):

&tag=oreilly.com/title&tag=oreilly.com/author-names&tag=oreilly.com/publication-year

The resulting URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/title matches "Python" and oreilly.com/publication-year < 2006&tag=oreilly.com/title&tag=oreilly.com/author-names&tag=oreilly.com/publication-year

This is similar to the following SQL:

SELECT title, authors, year FROM oreilly.com WHERE title LIKE '%Python%' AND year < 2006;

Fluidinfo returns a JSON object like this:

{u'results': {u'id': {u'1a91e021-7bce-4693-bfa5-0dc437fe1817': 
    {u'oreilly.com/author-names': {u'value': [u'Anna Ravenscroft', u'David Ascher', u'Alex Martelli']},
     u'oreilly.com/publication-year': {u'value': 2005},
     u'oreilly.com/title': {u'value': u'Python Cookbook, Second Edition'}},
u'1d25baae-b977-4ff4-bb77-01c52bd1d339': 
    {u'oreilly.com/author-names': {u'value': [u'Fredrik Lundh']},
     u'oreilly.com/publication-year': {u'value': 2001},
     u'oreilly.com/title': {u'value': u'Python Standard Library'}},
u'3360f05f-9bf4-4da5-abc0-0e3742809b98': 
    {u'oreilly.com/author-names': {u'value': [u'Fred L. Drake Jr', u'Christopher A. Jones']},
     u'oreilly.com/publication-year': {u'value': 2001},
     u'oreilly.com/title': {u'value': u'Python & XML'}},
u'9845b184-ef1b-46fb-8e7c-011da053dcb6': 
    {u'oreilly.com/author-names': {u'value': [u'Andy Robinson', u'Mark Hammond']},
     u'oreilly.com/publication-year': {u'value': 2000},
     u'oreilly.com/title': {u'value': u'Python Programming On Win32'}}}}}

It's also possible to update and delete tag values from matching objects. This process is explained in detail in the Fluidinfo documentation and this blog post.

Finally, rather than interacting with Fluidinfo directly using the raw HTTP API it's a good idea to use one of the client libraries listed here. For example, using the fluidinfo.py library the last example query can be executed as follows:

>>> import fluidinfo
>>> import pprint
>>> headers, result = fluidinfo.call('GET', '/values', tags=['oreilly.com/title', 'oreilly.com/author-names', 'oreilly.com/publication-year'], query='oreilly.com/title matches "Python" and oreilly.com/publication-year < 2006')
>>> pprint.pprint(headers)
{'cache-control': 'no-cache',
 'connection': 'keep-alive',
 'content-length': '937',
 'content-location': 'https://fluiddb.fluidinfo.com/values?query=oreilly.com%2Ftitle+matches+%22Python%22+and+oreilly.com%2Fpublication-year+%3C+2006&tag=oreilly.com%2Ftitle&tag=oreilly.com%2Fauthor-names&tag=oreilly.com%2Fpublication-year',
 'content-type': 'application/json',
 'date': 'Thu, 10 Mar 2011 15:17:58 GMT',
 'server': 'nginx/0.7.65',
 'status': '200'}
>>> pprint.pprint(result)
{u'results': {u'id': {u'1a91e021-7bce-4693-bfa5-0dc437fe1817': {u'oreilly.com/author-names': {u'value': [u'Anna Ravenscroft',
... etc ...

Learn more

Hopefully, this has explained enough to get you started. If you don't have a Fluidinfo account, you can sign up here. If you have any questions, please don't hesitate to get involved with the Fluidinfo community, contact us directly or join us on IRC. We'll be more than happy to help!

O’Reilly Fluidinfo Chrome extension

Filed under: Howto,People,Programming,Writable APIs — Nicholas Tollervey @ 9:40 am

To help people get going with the API competition announced today on the O’Reilly Radar site, Emanuel Carnevale has written a cool extension for Google’s Chrome browser. The extension shows some of the non-O’Reilly tags on the book objects and also lets you indicate which O’Reilly books you own. It does this by putting tags onto the objects representing O’Reilly books in Fluidinfo.

To install the extension onto your Chrome browser click on the following link (from within Chrome): https://fluiddb.fluidinfo.com/about/oreilly.com/fluidinfo/chrome-extension.crx. Your browser will guide you through what to do. It’s pretty obvious stuff. Once it’s installed you’ll see a new icon in the top right hand corner of the browser window between the address bar and the little spanner icon:

Click the icon and sign in with your Fluidinfo credentials. If you don’t yet have an account on Fluidinfo you can sign up here.

How do you use it..?

Simple. Go visit the O’Reilly catalog and click on one of the books you own. For example, I happen to be the proud owner of Natural Language Processing with Python. If you visit the page for the book you’ll notice a new small Fluidinfo icon in the book details:


Click the icon and you’ll see a pop-up like this:

You can click on the appropriate statement at bottom to indicate ownership or not, as the case may be.

The writable API gives us all a voice

The extension uses an “owns” tag in your top-level Fluidinfo namespace to indicate book ownership on the objects in Fluidinfo. For example, my tag is called “ntoll/owns”. The extension attaches this tag to the object representing the O’Reilly book whose page you are visiting.

Because the extension tags the exact same Fluidinfo objects that have the O’Reilly information, I can start to do some really cool searches. For example, I happen to know Terry has a particularly large O’Reilly “zoo” as do I (in fact, doesn’t every developer..?). We can see what books we both own about Python with the following query:

oreilly.com/title matches "Python" and has terrycojones/owns and has ntoll/owns

The following code snippet for running this query uses the fluidinfo.py client library from within the Python shell. Alternatively, you can see the result directly if you visit this URL.

>>> import fluidinfo
>>> import pprint
>>> headers, result = fluidinfo.call('GET', '/values', tags=['oreilly.com/title',], query='oreilly.com/title matches "Python" and has terrycojones/owns and has ntoll/owns')
>>> pprint.pprint(result)
{u'results': {u'id': {
                      u'01371c03-9097-4267-a137-ae88a23790ef': {u'oreilly.com/title': {u'value': u'Python Pocket Reference, Fourth Edition'}},
                      u'4e9c42b6-68cb-43f5-9b75-60af9c0bd5a7': {u'oreilly.com/title': {u'value': u'Programming Python, Fourth Edition'}},
                      u'cd0838db-96ae-42ae-98c9-248a1507e2bb': {u'oreilly.com/title': {u'value': u'Python in a Nutshell, Second Edition'}}}}}

This illustrates how anyone can add tags to the objects being used by O’Reilly, and can then search based on their own additions and those of others. That’s why we say that Fluidinfo provides writable APIs. Cool πŸ™‚

Run with it!

There’s obviously a lot more that could be done with this extension. We kept it simple mainly because we wanted to give an example of how such an extension could be written. We hope it can provide a basis for your own efforts, especially if you’re entering the O’Reilly API competition. Emanuel has released the source code for the extension so you can grab it from Github and take it from there!

March 15, 2011

Fluidinfo named as Tim O’Reilly’s favorite startup

Filed under: Awesomeness,Events,People — Terry Jones @ 4:11 pm

Image credit: The Guardian

Tim O’Reilly was interviewed by Jason Calacanis on stage on the opening day of SXSW last week. Jason asked Tim to name his favorite start-up and Tim nominated Fluidinfo! Thanks Tim πŸ™‚

You can read excerpts from the interview in A SXSW fireside chat with Tim O’Reilly and Jason Calacanis on the TechChi blog:

O’Reilly’s favorite startup is Terry Jones’ Fluidinfo “because I’m not sure it’s going to work. He’s got his teeth into something that is bigger than he is. He may be overwhelmed and he may not get it,” O’Reilly said.” That passion it’s kind of like the Wright Brothers that wanted to fly or Thomas Edison and the light bulb… It’s not the entrepreneur chasing the million bucks, it’s the entrepreneur chasing the big idea.”

The interview was also picked up in a Business Insider article: Tim O’Reilly’s Favorite Startup.

March 5, 2011

Indicating (shared) interest in things without disclosing what they are

Filed under: Awesomeness,Essence,Howto,Programming — Terry Jones @ 2:21 pm

Imagine you want wanted to tell the world you were interested in something, for example an email address or a phone number, without telling the world what that thing was. That may not sound so interesting, but if several people were doing the same thing, it would be a mechanism for discovery of private things you had in common, without telling anyone else what those things were.

Russell Manley and I just thought of a simple way to do this using Fluidinfo. Here’s how we did it for the email addresses we know.

For each email address, compute its MD5 sum. Then, put a rustlem/knows or terrycojones/knows tag onto the object whose fluiddb/about value is the MD5 sum. The MD5 algorithm is essentially one-way, so even if someone finds a Fluidinfo object with either of our tags on it (which is trivial) they cannot recover the original email address.

This is pretty nice. We’re independently indicating things of interest, but neither of us is publicly saying what those things are. Because we’re putting our information onto the same objects in Fluidinfo, we can then easily discover things we have in common with each other (and with others), without the world knowing what. We can do the same thing for phone numbers, or anything else.

Getting the data into Fluidinfo was trivial. Here’s code I used to put a terrycojones/knows tag (with value True) onto the appropriate objects:

import sys, hashlib
from fom.session import Fluid

fdb = Fluid()
fdb.login('terrycojones', 'PASSWORD')

for thing in sys.stdin.readlines():
    about = hashlib.md5(thing[:-1]).hexdigest()
    fdb.about[about]['terrycojones/knows'].put(True)

You pass a list of email addresses to this script on standard input.

Russell and I each had about a thousand email addresses in our address books. A first question is how many addresses we know in common. You can get the answer to this with the simple Fluidinfo query has terrycojones/knows and has rustlem/knows. It turns out there are 53 common addresses. But the results don’t tell us which addresses those are, which is also interesting.

We also wrote a small script to print any tags ending in /knows for a set of email addresses given on the command line.

import sys, hashlib
from fom.session import Fluid
from fom.errors import Fluid404Error
fdb = Fluid()

for thing in sys.argv[1:]:
    about = hashlib.md5(thing).hexdigest()
    print thing, about
    try:
        for tag in fdb.about[about].get().value['tagPaths']:
            if tag.endswith('/knows'):
                print '\t', tag
    except Fluid404Error:
        print '\tunknown'

So given an email address, we can run the above and see who else knows (or claims to) that email address.

We find all this quite thought provoking. Without going into details of the social side of this, it’s worth pointing out that Fluidinfo makes this kind of information sharing very easy because it has a guaranteed writable object for everything, including all MD5 sums. Because the fluiddb/about tag is unique and isn’t owned by anyone, any user can add their knows tag to the object for any MD5 sum. The ability for users and applications to work independently and yet to share information by just following a fluiddb/about convention is one of the coolest things about Fluidinfo.

Finally, note that this system does not guarantee privacy. If someone already knows an email address or phone number (etc) they can compute its MD5 sum and examine the Fluidinfo tags on the corresponding object. Doing so they might see a rustlem/knows tag and would then be free to draw their own conclusion.

You can play too. All you need is a Fluidinfo account and the above code. Please let us know how you get on. For example, you can freely tweet any MD5 sums we have in common. We’re going to use the hashtag #incommon, like this.

February 25, 2011

Fluidinfo voted Top Technology Company at LAUNCH in San Francisco

Filed under: Events,Happiness — Terry Jones @ 12:43 am

Wow…. Fluidinfo was just voted “the clear winner” as Top Technology Company out of 100 start-ups in the LaunchPad at the LAUNCH conference in San Francisco!

Thanks to all the judges, especially to Robert Scoble, Marshall Kirkpatrick, Brian Alvey, Naval Ravikant and Mark Pesce. Mark said “Fluidinfo is totally crazy, but it’s the kind of crazy I love.” πŸ™‚

It’s weird, because 3 weeks ago I told Jason Calacanis, who suggested in email that we enter, that I didn’t think we should go up on stage at LAUNCH. We don’t (yet) do UI, and start-up events are heavily oriented towards sexy UI and ideas that can be explained in a couple of minutes. We were so busy already, it seemed like a recipe to do something mediocre if we threw together a demo. Instead I asked if we could just hang out in the LaunchPad with the 99 other start-ups and talk to people passing by. Today the judges went through the LaunchPad and talked to all the start-ups. Several told me we should go up on stage to do a 3 minute presentation, so I thought “why not?”, threw together some Keynote slides, opened some browser tabs (on the quick WeMet.At app written by Nicholas Tollervey in a few hours, and the increasingly great Fluidinfo Explorer (that link points at the ReadWriteWeb top-level namespace in Fluidinfo) written by Pier-Andre Parent, who we’ve never even met) and went for it.

The whole Fluidinfo team has been working really hard towards LAUNCH for the last 3 weeks. Everything at LAUNCH and recently announced on our blog has been built by all of us. It’s nice to win a prize, because after being funded we decided to be quiet, keep our heads down, build and train a team, and not even try to get people to use Fluidinfo until 2011. In January we began our first outward-facing efforts, building and releasing quite a few writable APIs. It’s still early days yet, and we have something very cool right around the corner.

Thanks to all the other great startups and the organizers at LAUNCH, especially Jason Calacanis, Tyler Crowley, and Jason Krute. Standing up for 20 hours and talking for about 30 was never so much fun!

February 23, 2011

Putting domain names onto data with Fluidinfo

Filed under: Data,Essence,Writable APIs — Terry Jones @ 11:15 am

Internet domain names can be thought of as a mechanism for attaching trust and reputation to digital information. We do this in two major ways: (1) by using domain names in the URLs of web pages, and (2) by putting them in the sender’s “From” address of email messages.

To give a concrete example, suppose you see some shoes for sale on a web page. If you look at the page URL and see the amazon.com or zappos.com domain name, trust and reputation knowledge springs instantly to mind. You know the quality is probably good, the price competitive, and that if the shoes are lost in shipping you’ll be sent another pair for no charge. On the other hand, if you see ebay.com in the URL, a different matrix of trust and reputation knowledge will spring to mind. A similar thing happens if you get email from someone you’ve never met. If you see stanford.edu or forbes.com in the email “From” line, reputation information springs to mind.

Looked at in this way, domain names are small tokens that we send alongside other pieces of content such as web pages and emails. The domain name carries vital trust and reputation information. Recognition and trust in domain names is globally distributed, spread variously through the brains of most of the people on the planet, with its integrity guaranteed by DNS. Domain names make the internet useful. Without them, digital information online would be almost useless as we could not confidently trust any 3rd party data.

Question: given that we can attach domain names to web pages and email messages, can we find a way to attach them to other things?

Domain names on data

We’re excited to announce that Fluidinfo now makes it possible to put domain names onto individual pieces of data.

To illustrate, the image on the right shows a fanciful example book object in Fluidinfo (large version). The tag names on the object are colored. You’ll see that some of them contain domain names: amazon.com/price, barnesandnoble.com/price and vintage.com/epub. Tags in Fluidinfo can have values, as illustrated by the amazon.com/price tag whose value for this book is $19.

The combination of a Fluidinfo tag name containing a domain and an associated tag value is exactly like a URL containing a domain name and an associated HTML value (i.e., a web page) or an email message with a domain name in its From line.

Because Fluidinfo objects don’t have owners (their tags do, though), any number of domain owners are free to put their information, branded with their domain name, onto any Fluidinfo object.

A killer combination: writable APIs with domain-branded data

Fluidinfo automatically provides a writable API for all its data. By allowing for domain names on data, domain holders who want to publish information about their products can now do so with an API that has three major advantages:

  • Your data is branded with your domain name.
  • Your data lives in a writable ecology of related data, collecting on the same Fluidinfo objects. This allows for search across data from different users and domains, put there by different applications. It allows for additional data of all kinds, for mashups, and for customization, personalization, and filtering.
  • Fluidinfo has a flexible permissions system at the level of its tags, so you maintain full control of your own data. You can make it public or private, or can allow or disallow access for specific others.

Because Fluidinfo objects are fine grained, composed simply of tags with values as in the image above, applications can fetch, search on, or combine specific pieces (or combinations) of data provided by different trusted sources with single requests. There is a general principle here: information becomes more useful and valuable when it is stored in context. This is illustrated vividly by Google, which collects web pages into one place to enable search, and by Wikipedia, which allows people to pool related information. Although these examples have very different models of trust and reputation, they both illustrate the underlying principle.

Getting your domain name in Fluidinfo

To start using your domain in Fluidinfo, first sign up, using your domain name as your user name. Our sign-up system will recognize that the username is a domain and will send you an email telling you how to prove that you control the domain. Once that’s done, you can begin using Fluidinfo to upload information branded with your domain and to provide an API for others (or for your own company) to find your products or otherwise use the information you make available.

In other words, all Fluidinfo usernames that correspond to actual internet domains are automatically reserved for their owners. Besides preventing a chaotic land grab, this is how we can guarantee to people seeing information in Fluidinfo that the value of a Fluidinfo tag whose name includes a domain name can be trusted exactly as it would be if that domain appeared in a web page URL or email From address.

So there you have it… domain names on data. We’re very excited to see where this will lead and we’re actively building out some writable APIs with domain-branded data. You can too. Claim your domain name in Fluidinfo right now.

How to make an API in Fluidinfo

Filed under: Data,Howto,Programming — Nicholas Tollervey @ 11:10 am

It’s very simple really:

1. Register a domain/user on Fluidinfo

Start here. If you’re registering a domain name then we will require proof of ownership (the instructions explaining how to do this are very simple).

2. Create your namespaces and tags

Be careful, you’re choosing how your data will be structured in Fluidinfo. Some tips we’ve found useful:

  • Flat is good.
  • Use namespaces to differentiate between the different sorts of things you’ll be tagging (e.g. between books and authors).
  • Copy conventions (how do others organise their data?).
  • KISS! Keep it simple (stupid!).

3. Import your data into Fluidinfo

To help you, we have a Python based script/library called FLIMP. Nevertheless, there are lots of freely available libraries that you may want to adapt yourself.

4. Announce your new API

Programmers will interact with your data via the general Fluidinfo API, which is simple and well documented. All you need to do is tell the world that your data is available, and what namespaces and tags you’re using to store it in Fluidinfo.

That’s it.

Please feel free to get in touch at any time if you have any questions or would like to explore the possibility of Fluidinfo Inc. helping you to add your data to Fluidinfo.

ReadWriteWeb ReadWriteAPI

Filed under: APIs,Data,Writable APIs — Nicholas Tollervey @ 11:04 am

Over the weekend I scraped the 11300 or so articles in the ReadWriteWeb archive. These are a great source of technology news and analysis covering stories from 2003 to the present day. Rather than keep this to myself (and rather unsurprisingly) I imported the metadata about each article into Fluidinfo. Hey presto, another instant API emerges!

Here’s how it works. For each article in the ReadWriteWeb archive there is an object in Fluidinfo. Each object has a unique “about” tag-value: the URL of the article. Furthermore, each object is annotated with information using tags found under the readwriteweb.com top level namespace. Tags include title, extract, date, categories and so on. In other words, you might visualize each object something like this:

I’ve also created and annotated objects about each of the authors of ReadWriteWeb articles and tagged objects representing each website ever mentioned by ReadWriteWeb.

So, it’s now possible to use the API like this:

>>> import fluidinfo
>>> returnTags = ['readwriteweb.com/title', 'readwriteweb.com/author-name', 'readwriteweb.com/extract', 'readwriteweb.com/date', ]
>>> query = "readwriteweb.com/year = 2010 and readwriteweb.com/month = 5 and readwriteweb.com/day = 5"
>>> head, result = fluidinfo.call('GET', '/values', tags=returnTags, query=query)
>>> head['status']
'200'
>>> result
{u'results': 
    {u'id': 
        {u'05936b9b-4c20-4887-9607-f63752e7f274':
            {u'readwriteweb.com/author-name': {u'value': u'Sarah Perez'},
              u'readwriteweb.com/date': {u'value': u'May  5, 2010  7:24 AM'},
              u'readwriteweb.com/extract': {u'value': u"Feel like hacking your phone today? If you've got about 10 minutes to spare, you can turn your iPhone into a Wi-Fi hotspot using a combination of the ..."},
              u'readwriteweb.com/title': {u'value': u'How To Turn Your iPhone into a Wi-Fi Hotspot'}},
        ... etc....

What’s just happened..? I used a client library (fluidinfo.py) to ask Fluidinfo to return the author name, publication date, title and an extract of all ReadWriteWeb articles published on the 5th May 2010.

Being able to search and extract data from an API is cool, especially since you get this by virtue of simply hosting your data in Fluidinfo. But this is ReadWriteWeb we’re talking about. Happily, Fluidinfo can accommodate.

>>> fluidinfo.login('ntoll', 'mysecretpassword') # change as appropriate
>>> headers, result = fluidinfo.call('PUT', ['about', 'http://www.readwriteweb.com/archives/android_app_growth_on_the_rise_9000_new_apps_in_march_2010.php', 'ntoll', 'rating'], 10)
>>> headers
{'cache-control': 'no-cache',
 'connection': 'keep-alive',
 'content-type': 'text/html',
 'date': 'Wed, 23 Feb 2011 15:07:29 GMT',
 'server': 'nginx/0.7.65',
 'status': '204'}

The example above shows how I sign in and annotate the object “about” the article http://www.readwriteweb.com/archives/android_app_growth_on_the_rise_9000_new_apps_in_march_2010.php with a tag called ntoll/rating and an associated value of 10 (obviously I enjoyed this article). The HTTP 204 response status tells me the value was successfully tagged.

Let’s just pause here for a moment and consider what I’ve just been able to do. Because Fluidinfo is openly writable I’m able to annotate the objects about ReadWriteWeb articles with my own data. Since objects in Fluidinfo don’t have owners or permissions attached to them I didn’t have to ask ReadWriteWeb for permission to augment the data about the article in question. Furthermore, if I only want my buddies to see what my ratings are I can set the tag to be only visible to a specific group of people. In this way Fluidinfo remains openly writable yet I still retain ownership and control over my data.

We’ve seen “read” and “write”, but what about “web”..?

Well it turns out I can stretch this analogy even further. Because everyone is tagging the same objects (identified by their “about” tag values) the data is being linked by virtue of the context of the object. We’re starting to get a web of linked data (yeah, I know, bear with me on this one…).

Since I can search and retrieve using any of the tags for which I have “read” permission I can start to create really cool mash-ups of data like this:

>>> header, result = fluidinfo.call('GET', '/values', tags=['fluiddb/about', 'boingboing.net/mentioned', 'readwriteweb.com/mentioned'], query="has boingboing.net/mentioned and has readwriteweb.com/mentioned and has unionsquareventures.com/portfolio")
>>> header
{'cache-control': 'no-cache',
 'connection': 'keep-alive',
 'content-length': '23528',
 'content-location': 'https://fluiddb.fluidinfo.com/values?query=has+boingboing.net%2Fmentioned+and+has+readwriteweb.com%2Fmentioned+and+has+unionsquareventures.com%2Fportfolio&tag=fluiddb%2Fabout&tag=boingboing.net%2Fmentioned&tag=readwriteweb.com%2Fmentioned',
 'content-type': 'application/json',
 'date': 'Wed, 23 Feb 2011 15:24:36 GMT',
 'server': 'nginx/0.7.65',
 'status': '200'}
>>> len(result['results']['id'])
4
>>> for r in result['results']['id'].values():
...     print r['fluiddb/about']['value']
... 
http://www.twitter.com
http://www.etsy.com
http://www.boxee.tv
http://www.meetup.com

What..? I’ve just asked Fluidinfo for all the articles from BoingBoing and ReadWriteWeb about companies backed by Union Square Ventures that both BoingBoing and ReadWriteWeb have covered. It turns out there are four companies: Twitter, Etsy, Boxee and Meetup.

What do one of these results look like..?

{u'boingboing.net/mentioned': 
    {u'value': [u'http://boingboing.net/2009/11/06/vampireotherkinenerg.html',
                     u'http://boingboing.net/2010/01/11/ny-times-on-urban-ca.html',
                     u'http://boingboing.net/2010/10/26/ron-paul-supporter-w.html',
                     u'http://boingboing.net/2002/06/27/meetup-meatspace-cam.html',
                     u'http://boingboing.net/2004/03/17/wired-rave-awards.html',
                     u'http://boingboing.net/2006/01/05/net-pug-nabbed-by-cr.html']},
u'fluiddb/about': 
    {u'value': u'http://www.meetup.com'},
u'readwriteweb.com/mentioned': 
    {u'value':  [u'http://www.readwriteweb.com/archives/meetup_the_secret_campaign_weapon.php']}}

What was involved in making such a cool query possible..? Simply importing data into Fluidinfo.

I’ll say no more and let you ponder the implications of what I’ve just demonstrated…

February 15, 2011

How I made a writable API for Union Square Ventures in an hour

Filed under: APIs,Essence,Howto,Programming,Writable APIs — Terry Jones @ 9:11 am

Image: Eric Archivell

I was mailing Fred Wilson and Albert Wenger of Union Square Ventures late last year, talking about Fred’s article Giving every person a voice. Fred said

I hadn’t really thought that we are all about shrinking the minimal viable publishing object, but that may well be true in hindsight.

I wanted to illustrate Fluidinfo as doing both: providing a minimal viable way to publish data (with an API), and also giving everyone a voice. So I decided to build Union Square Ventures a minimal API, and to then add my voice. In an hour.

A minimal viable API for USV

USV currently has 30 investments. If you want to get a list of the 30 company URLs, how would you do it? A non-programmer would have no choice but to go to the USV portfolio page, and click on each company in turn, then right-click on the link to each company’s home page and copy the link address, and then add that URL to your list. That process is boring and error prone.

If you’re a programmer though, you’d find this ridiculously manual. You’d much rather do that in one command, for example if you’re collecting information on VC company portfolios, perhaps for research or to get funded. Or if you were building an application, perhaps to do what Jason Calacanis is doing as part of the collecting who’s funding whom on Twitter and Facebook. You want your application to be able to fetch the list of USV company URLs in one simple call.

So I made a unionsquareventures.com user in Fluidinfo (sign up here), did the repetitive but one-time work of getting their portfolio companies’ URLs out of their HTML (so you wouldn’t have to), and added it to Fluidinfo. I put a unionsquareventures.com/portfolio tag onto the Fluidinfo object about each of those URLs. In other words, because Fluidinfo has an object for everything (including all URLs), I asked it to tag that object.

That was just 7 lines of code using the elegant and simple Python FOM library for Fluidinfo written by Ali Afshar:

import sys
from fom.session import Fluid

fdb = Fluid()
fdb.login('unionsquareventures.com', 'password')
urls = [i[:-1] for i in sys.stdin.readlines()] # Read portfolio URLs from stdin

for url in urls:
    fdb.about[url]['unionsquareventures.com/portfolio'].put(True)

As a result, using the jsongrep script I wrote to get neater output from JSON, I can now use curl and the Fluidinfo /values method to get the list of USV portfolio companies in the blink of an eye:

curl 'http://fluiddb.fluidinfo.com/values?query=has%20unionsquareventures.com/portfolio&tag=fluiddb/about' |
jsongrep.py results . . fluiddb/about value | sort
u'http://amee.cc'
u'http://getglue.com'
u'http://stackoverflow.com'
u'http://tumblr.com'
u'http://www.10gen.com'
u'http://www.boxee.tv'
u'http://www.buglabs.net'
u'http://www.clickable.com'
u'http://www.cv.im'
u'http://www.disqus.com'
u'http://www.edmodo.com'
u'http://www.etsy.com'
u'http://www.flurry.com'
u'http://www.foursquare.com'
u'http://www.hashable.com'
u'http://www.heyzap.com'
u'http://www.indeed.com'
u'http://www.meetup.com'
u'http://www.oddcast.com'
u'http://www.outside.in'
u'http://www.returnpath.net'
u'http://www.shapeways.com'
u'http://www.simulmedia.com'
u'http://www.soundcloud.com'
u'http://www.targetspot.com'
u'http://www.twilio.com'
u'http://www.twitter.com'
u'http://www.workmarket.com'
u'http://www.zemanta.com'
u'http://zynga.com'

There you have it, a sorted list of all Union Square Ventures portfolio companies’ URLs, from the command line. I can do it, you can do it, and any application can do it.

The jsongrep.py program can also be used to pull out selective pieces of the output. For example, which of the companies have “ee” in their URL?

curl 'http://fluiddb.fluidinfo.com/values?query=has%20unionsquareventures.com/portfolio&tag=fluiddb/about' |
jsongrep.py results . . fluiddb/about value '.*ee' | sort
u'http://www.meetup.com'
u'http://amee.cc'
u'http://www.indeed.com'
u'http://www.boxee.tv'

So maybe, in order to be funded by USV, it helps to have “ee” in your URL? πŸ™‚

What about USV companies that don’t have “.com” URLs?

curl 'http://fluiddb.fluidinfo.com/values?query=has%20unionsquareventures.com/portfolio&tag=fluiddb/about' |
jsongrep.py results . . fluiddb/about value '.*(?

OK, these things are geeky, but that's part of the point of an API: to enable applications to do things. We've made the portfolio available programmatically, and you can immediately see how to do fun things with it that you couldn't easily do before. In fact, it's quite a bit more interesting than that. As a result of doing this work, I can tell you that there was a company listed a couple of months ago on the portfolio page that is no longer there. And there's a company that's been invested in that's not yet listed. That's a different subject, but it does illustrate the power of doing things programmatically.

This is a minimal viable API for USV because there's only one piece of information being made available (so far). But an API it is, and it's already useful.

It's also writable.

Giving everyone a voice

In a sense we've just seen that everyone has a voice. USV put a tag onto the Fluidinfo objects that correspond to the URLs of their portfolio companies and they didn't have to ask permission to do so.

But what about me? I'm a person too. I've met the founders of some of those companies, so I'm going to put a terrycojones/met-a-founder-of tag onto the same objects. Fluidinfo lets me do that because its objects don't have owners, its permission system is instead based at the level of the tags on the objects.

So I wrote another 7 line program, like the one above, and added those tags. I also added another USV tag, called unionsquareventures.com/company-name. Let's pull back just the names of the companies whose founders I've met:

curl 'http://fluiddb.fluidinfo.com/values?query=has%20unionsquareventures.com/portfolio%20and%20has%20terrycojones/met-a-founder-of&tag=unionsquareventures.com'/company-name |
jsongrep.py results . . . value | sort
u'Bug Labs'
u'Foursquare'
u'GetGlue'
u'Meetup'
u'Shapeways'
u'Stack Overflow'
u'Tumblr'
u'Twitter'
u'Zemanta'

Isn't that cool? I do indeed have a voice!

You have one too. If you sign up for a Fluidinfo account you can add your own tags and values to anything in Fludinfo. And you can use Fluidinfo, just as I've illustrated above, to make your own writable API. See also: our post from yesterday, What is a writable API?

Interview on writable book APIs & publishing at O’Reilly TOC

Filed under: Events — Terry Jones @ 9:06 am

Below is an interview I did yesterday with Mac Slocum at the O’Reilly TOC conference in New York. We discuss writable book APIs and why they matter, as well as talking about what that might mean for publishers, readers, and the publishing process in general. (You can also see the interview on YouTube.)

Watch live streaming video from oreillyconfs at livestream.com
« Newer PostsOlder Posts »

Powered by WordPress