Archive for the ‘Essence’ Category

Interview with Robert Scoble

Tuesday, April 24th, 2012

Fluidinfo CEO Russell Manley and I were interviewed by Robert Scoble at his home in Half Moon Bay a couple of weeks ago. I really enjoy talking to Robert. He’s unique in his ability to do both breadth and depth. He covers a wide array of the very latest internet technology at blistering speed, and yet he really digs in to the stuff that he finds fascinating. Robert interviewed me in Barcelona over three years ago – we spent about six hours together talking about Fluidinfo and technology in general. He was sitting in the front row during our TechCrunch disrupt presentation in 2010 (see the fun coincidence between Robert and John Borthwick at the 4 minute mark), and he was one of the judges when we won the Top Technology prize in 2011 at the LAUNCH conference. It’s great to be on Robert’s radar and to have his support.

The video is below. You’ll find some comments on it in Robert’s Google+ posting.

Getting Started With Fluidinfo is now out!

Tuesday, March 6th, 2012

Getting Started With Fluidinfo is now available in hardcopy and various eBook formats from O’Reilly. The authors, Nicholas Radcliffe (@njr) and Nicholas Tollervey (@ntoll), know Fluidinfo inside out, as you might hope. They’ve written multiple Fluidinfo client libraries, web applications, command line tools, visualizations, have written many blog posts about Fluidinfo, have imported tons of data into the system, and have both contributed to the design and architecture in many ways. The books is extremely well written. Both of the Nicholases are entertaining and clear writers.

The first chapter has a wonderful introduction and overview of Fluidinfo, and should be understandable by a broad audience. After that, things get more technical with a chapter on using Fish, a Fluidinfo shell, either from your shell command line or via Shell-fish, a web interface. Playing directly with Fluidinfo, adding to objects and running queries, is probably the best way to understand its (very simple!) data model. Then it’s on to programmatic access, using two Python libraries (one low level, one high level) and via Javascript. An example social book reader application is built from the ground up in Javascript. The book concludes with chapters on the REST API, advanced use of Fish, discussion of the special Fluidinfo about tag, and a description of the query language.

We hope you’ll grab a copy and come join us either on the fluidinfo-users or fluidinfo-discuss mailing lists, or on the #fluidinfo IRC channel on Freenode (chat right now with a web based client). We’ll be happy to say hi and to help get you going.

Next-generation tagging

Wednesday, January 4th, 2012

Image: rosweed

In 2003, Joshua Schachter released Delicious, and brought tagging to the web. Delicious gave normal users a place to store and share their information about any web page. By January 2007, a Pew Internet & American Life Project study had found that 28% of online Americans had used tagging. As offered by Delicious, and popularized by sites such as Flickr and Technorati, basic tagging has three fundamental properties:

  1. You can tag any URL.
  2. You don’t have to ask permission.
  3. Shared public tagging creates social value

Most fundamentally, tagging made the web more writeable. This changed our perception of the web itself, from something that was almost entirely read-only to something where we were suddenly invited to contribute. The change was a cornerstone of what Tim O’Reilly dubbed “Web 2.0″. It’s what Fred Wilson describes as “giving every person a voice“. Fred quotes Ev Williams, a Twitter founder, who says “society has not fully realized what this means”. One view of Twitter is that it offers a form of simple self-tagging. Viewing tagging as low-friction publication of small pieces of information about things, Twitter has done for people what Delicious did for URLs.

Tagging with Fluidinfo

The model of information in Fluidinfo takes tagging to a new level. Over the last three months we’ve been hard at work building a user interface (UI) to make this possible. The two main additional properties of tagging with Fluidinfo are:

  1. You can tag anything at all, not just URLs.
  2. Tags can optionally be given values.

Tagging is too powerful an idea to be restricted to just URLs or people. It should be possible to tag anything. And, as you’ll see below, tags with values are powerful and natural. We tag with values in the non-digital world all the time.

Tag anything at all

Every day each of us encounters thousands of things that are not URLs. With Fluidinfo you can tag anything you can name. You can tag songs, movies, and books – just put the name into the box at the top of the Fluidinfo page, hit RETURN, and you’ll be looking at the page for that thing in Fluidinfo. If you’re logged in, you can add tags. Look at what pops up on the left to see other information about the object. You can also tag a product name, an email address, a person’s name, an IP address, a license plate number, a place name, flight numbers, a word or phrase, a zip code, a stock symbol, a latitude/longitude pair, a Twitter @name or #hashtag, taxi medallion numbers, time/date combinations, phone numbers, a DNA sequence, a chess position or a Sudoku puzzle, etc. To get to the Fluidinfo object for something, just visit http://fluidinfo.com/about/something. The easiest way to jump to the Fluidinfo page for something you see on the web is via a browser extension (Chrome, Firefox, Safari). Just right-click on a link, an image, on selected text, or on the page itself, and jump straight to that thing in Fluidinfo.

Tag with values

With Fluidinfo, you can provide a value when tagging. Tagging with values will seem new to many people, but it’s actually not new at all. Look at the luggage tag image above. It has (implicitly) a ‘destination’ tag with a value of ‘JFK’, it has a tag named ‘Flight number’ with a value of 222, a tag named ‘Row’ with a value of 3, a ‘Sorting Symbol’ with value ‘B’ and something else whose value is 821474. Those tag names and values are all put on the same physical tag because they’re related, and it makes sense that they travel together, attached to the luggage. So, far from being new, tagging with values simply provides us with something familiar which has always existed in the physical world of tagging.

To give some examples in Fluidinfo, I have put a ‘rating’ tag onto Gone With the Wind with a value of 4. A tag value can be anything at all: a comment, a set of keywords, ‘longitude’ and ‘latitude’ tags with numeric values, even arbitrary data (e.g., some HTML, an image, a PDF file).

Most interesting, a tag value can be a URL (or list of URLs). You might tag an image with a value that is the URL of another image. You might put a ‘for-dad’ tag onto something with a value pointing to a video. In the Fluidinfo UI, tag values that are URLs are shown as simplified embedded pages or images.

Examples of things in Fluidinfo

Here are links to a variety of things in Fluidinfo. Take a look at the list of links on the left of each page. These are ‘views’ of the data, either from different sources or displayed in different ways. If you log in, you’ll be able to add your own tags.

A shared writable object for everything: Sudoku problems

Wednesday, June 15th, 2011

Image: wikipedia

Fluidinfo is composed of objects that have tags with values. One of the tags is special, the about tag (its full name is actually fluiddb/about). One of its properties is that its values are unique across Fluidinfo. In other words, there can only be one object with any particular about value. It’s a bit like Wikipedia, which you can think of as having a single shared writable page for every topic imaginable. Fluidinfo offers a shared writable object for everything imaginable. The last part of a Wikipedia URL is like a Fluidinfo about tag value. That’s very wiki-like, but other properties of Fluidinfo make it more useful for applications than a wiki is. For example, it has a permissions system, its data is typed, and it has a query language. If you’d like to learn more about the about tag, there’s an entire blog named after it, run by Nicholas Radcliffe.

This morning I was having breakfast with Esteve Fernández and Jamu Kakar and Jamu mentioned a friend who’s heavily into Sudoku. I mentioned that there are several mobile apps that let you take a picture of a Sudoku puzzle and then solve it for you. I also said “Fluidinfo has an object for every Sudoku puzzle”.

I thought I’d write a very short blog post to illustrate this. So I took the Sudoku puzzle from the Wikipedia entry and made a Fluidinfo object for it. I read the cells of the puzzle left-to-right, top-to-bottom, and used a hyphen to indicate an empty cell in the starting configuration. Written as a single string of characters, that’s 53--7----6--195----98----6-8---6---34--8-3--17---2---6-6----28----419--5----8--79. I used the Fluidinfo Explorer to create an object in Fluidinfo with that as the about value. Then I put a tag called terrycojones/solution onto that object, with value 534678912672195348198342567859761423426853791713924856961537284287419635345286179, which is the correct solution to the puzzle, again read left-to-right, top-to-bottom.

This illustrates a few things. Firstly, Fluidinfo really does have an object for all Sudoku puzzles (created as needed, of course). Second, I’ve established a convention for the about tag value to represent those things. I could have done this in many different ways, and the solution I chose may not be the best. If I were intending to add information about lots of Sudoku puzzles, I would publish my choice and encourage others to follow it (which anyone could do, since any Fluidinfo user may create an object with any about tag – if the object already exists it’s no problem, you just get the already existing object). Third, the terrycojones/solution tag I put onto the object may not be of much use to the wider world. But, I could give other people (or applications) that I trust write permission for that tag so they could tag the objects too. Fourth, if I thought those solutions were worth something, I could make the tag unreadable by default and try to sell access to it (i.e., allow only those who paid me to have read access).

Finally, this illustrates the simple way in which isolated activities, like individually solving a puzzle, can be made social through shared writable storage. If I used a Sudoku application that tagged the shared object with some subset of terrycojones/solved or terrycojones/working-on-it or terrycojones/too-hard or terrycojones/solution-time tags, and others used that application too, solving Sudoku puzzles would instantly be social. Any Sudoku application could use Fluidinfo to allow you to see puzzles your friends couldn’t do or were working on, it could show you the amount of time your friends took, give you hints, find errors, etc. It’s easy to think of a ton of possible social extensions to the Sudoku world, and these include collaborative efforts.

This is a nice example of how shared writable storage with an object for everything allows formerly isolated actions to easily be made social. I’m planning to write up some more simple examples. There are many of them.

Personalized filtering of friend requests in social networks

Wednesday, June 1st, 2011

In an O’Reilly Radar article titled Getting closer to the web2.0 address book in October 2010, I described how a set of applications might coordinate via shared storage to solve Tim O’Reilly‘s question:

Given that knowledge about who I know is in my phone, in the O’Reilly org chart, and in the set of authors of O’Reilly books, why do I still have to manually approve friend requests from Facebook, LinkedIn, etc.?

In that article I argued that this aspect of what Tim calls the web 2.0 address book should be solved not by an application, or even by applications making API calls to one another, but by a set of applications that communicate asynchronously via shared writable storage. As well as shared storage, also needed are a permissions system, a query language, and conventions used by the cooperating applications. No API calls are needed between applications. The interaction between the cooperating applications takes place entirely via shared storage. API calls are only used to add, retrieve, and query data in the shared storage.

At Gluecon in Denver last week, I gave a talk titled The Real Promise of Cloud Storage in which I argued that shared storage gives two big advantages. The second was that it allows the evolution of “asynchronous data protocols”. The final slides of the presentation showed the series of steps that would be taken to build what Tim was asking for. I wasn’t able to go into the detail of how this would work in the time available. So in this post I’ll give the detail of how this can be implemented using Fluidinfo. The post is long because I want to spell out the steps, the assumptions, the Fluidinfo tag permissions, the queries, the security aspects, and so on.

The business case

I’m going to pretend LinkedIn is the company that would like its users to have more flexibility in approving friend requests. Additional flexibility could include personalized friend resolvers (for example based on the O’Reilly org chart) and also applications that could assist in various way, and if desired even automate some responses (e.g., people you have called more than twice in the last month, or people you have put on a particular list on Twitter).

LinkedIn is a good candidate because they have traditionally placed less emphasis on being a burgeoning social network, and more on link quality, by requiring a higher level of proof to create more meaningful reciprocal connections. That seems to be changing slowly now, though. You used to need to know another user’s email to connect to them, but that has been relaxed over the last couple of years. It’s obvious why LinkedIn would do this: knowing connections between their members is valuable. LinkedIn will now show me suggested people I might know, and let me reach out to connect with them with a single click. No need to go look up email addresses, etc.

I suspect many people have valid unanswered friend connection requests sitting in their LinkedIn inboxes. I do, and I almost never make time to deal with them. Those requests, if accepted, would add value to LinkedIn. The (valid) connection requests that sit unanswered for whatever reason represent unrealized value. So it’s very much in LinkedIn’s interest to help people connect. They should be interested in a solution to Tim’s dilemma, as should Facebook and any social network whose friending mechanism is reciprocal.

Using the shared writable storage of Fluidinfo to solve this problem

Here is how this could be achieved with Fluidinfo.

1. LinkedIn announces that it will support personalized friend resolution via Fluidinfo. Users will have control over what kinds of friend resolution they want to enable. LinkedIn will use the linkedin.com user name in Fluidinfo. (Fluidinfo makes it possible to put domain names on data. Every Fluidinfo user name that corresponds to a domain is reserved for its internet domain owner.)

2. Suppose Tim, a LinkedIn user, decides to try the new functionality. LinkedIn directs him to a page where he can choose a Fluidinfo user name (let’s say "tim") and create a Fluidinfo tag called tim/friend-request/linkedin. Tim sets the permissions so that the Fluidinfo linkedin.com user can both add and delete the tag on Fluidinfo objects. He then tells LinkedIn his Fluidinfo user name. This is necessary because LinkedIn will later create data in Fluidinfo on Tim’s behalf. Note that LinkedIn only knows Tim’s Fluidinfo user name, not his password.

3. Suppose a user Alice now tells LinkedIn that she is a friend of Tim’s. LinkedIn creates a new object in Fluidinfo and puts an instance of tim/friend-request/linkedin onto it with a value (perhaps in JSON format) holding the information Alice is willing to supply as evidence that she knows Tim. This could include things like her Twitter user name, her phone number, the name of a company she used to work at with Tim, etc. LinkedIn also adds a random request identifier (more on which below).

The object in Fluidinfo looks like this (the tag value is in the rectangle under the tag name):


Note that the object does not have an owner. That’s a fundamental feature of Fluidinfo: objects don’t have owners. Instead, tags on objects have permissions. That’s crucial in what follows, because it allows any application to add more information to the object representing the friend request.

A friend resolver based on Twitter friends: TwitResolve

4. Let’s suppose a keen-eyed developer notices the LinkedIn announcement and decides to write a Twitter-based friend resolver called TwitResolve. The developer grabs the internet domain twitresolve.com and then gets the twitresolve.com user name on Fluidinfo (as the domain owner, only they can do this, as above). TwitResolve will use two tags in Fluidinfo, twitresolve.com/accepted and twitresolve.com/examined. It sets the permissions on the tag twitresolve.com/accepted so that only the Fluidinfo user called linkedin.com can read it.

5. Let’s suppose Tim hears of TwitResolve and decides to use it. The TwitResolve sign-up page asks Tim to change the permissions on the Fluidinfo tag tim/friend-request/linkedin so that the Fluidinfo user twitresolve.com can read the tag. (The setting of the permission could be done in various ways, which we’ll ignore for now.) By giving TwitResolve permission to read the tag that LinkedIn puts onto objects, Tim enables TwitResolve’s participation in helping to resolve friend requests for him. TwitResolve also asks Tim for his Twitter username (and gets Tim to prove that).

6. TwitResolve runs a process periodically that asks Fluidinfo for the outstanding requests for Tim that it hasn’t already examined. The relevant query in Fluidinfo’s query language is has tim/friend-request/linkedin except has twitresolve.com/examined.

7. In our case, this query finds the object shown above and TwitResolve examines the value of the tim/friend-request/linkedin tag. It talks to Twitter to discover whether Tim is following Alice (it can do this because Alice’s Twitter user name is in the tag value, and Tim provided his Twitter user name to TwitResolve). Suppose Tim is not following Alice. TwitResolve cannot conclude anything, so it simply puts a twitresolve.com/examined tag (with no value) onto the request object in order to avoid re-examining this request. At this point the Fluidinfo object looks as below and the friend request is still unfulfilled.


A friend resolver based on the O’Reilly org chart

8. News gets out quickly to the digerati and someone inside O’Reilly decides to write a resolver based on the O’Reilly org chart. They create two tags, similar to those created by TwitResolve, called oreilly.com/accepted and oreilly.com/examined, again setting the permissions on the former so that only the Fluidinfo user called linkedin.com can read that tag. Tim decides to use the new resolver, so he gives the oreilly.com user permission to read the tim/friend-request/linkedin tag.

9. The O’Reilly resolver might be started by cron each night. When it runs, it sends a Fluidinfo query, just like TwitResolve does. Suppose Alice does not work at O’Reilly. The details in the request cannot confirm her as a friend of Tim’s. Just as with TwitResolve, the O’Reilly program puts an oreilly.com/examined tag onto the object, and we arrive at the following:


A friend resolver based on Amazon books

10. To give a third example (to fulfill Tim’s requirements), another resolver could be written based on authors of books. Suppose it’s written by someone at Amazon. (The details of how this might successfully match Alice to Tim aren’t important here.) Suppose it is also unsuccessful in matching this friend request. It adds an amazon.com/examined tag to the object:


An iPhone friend resolver: iPhoneFriender

11. Next, someone decides to write a resolver that can match based on phone numbers, called iPhoneFriender. They get the iphonefriender.com domain name and user name in Fluidinfo. They will use the tags (iphonefriender.com/accepted and iphonefriender.com/examined), and set the permissions on the iphonefriender.com/accepted tag so that only the linkedin.com Fluidinfo user can read it.

When the iPhoneFriender application runs, it sends the Fluidinfo query has tim/friend-request/linkedin except has iphonefriender.com/examined and finds our object. It reads the request details. Suppose it finds Alice’s phone number in Tim’s phone address book, and can see whether Tim has called Alice or vice-versa, whether Tim has chosen not to accept her calls, etc. iPhoneFriender has preference settings to allow Tim to specify what kinds of requests it will ask him to confirm, and which to automatically accept. Suppose that one way or another the friend request is accepted.

12. The iPhoneFriender application needs to tell LinkedIn that Alice is recognized as a friend of Tim’s. So it puts an iphonefriender.com/accepted tag with value 9871261721498793 onto the original object. This is the unique request_id value from the original request. Remember that the iphonefriender.com/accepted tag has its permissions set so that only linkedin.com can read it. Hiding the identifier in this way prevents rogue applications from falsely claiming to have satisfied friend requests. Only an application that Tim has given permission to read the tim/friend-request/linkedin tag to could know the request identifier. Only LinkedIn can read the identifier value out of the iphonefriender.com/accepted tag attached by iPhoneFriender. iPhoneFriender also adds an iphonefriender.com/examined tag to the object to avoid repeating work.

The Fluidinfo object now looks as follows (tag values, when present, are shown in rectangles under the tag names):


Friendship, requited!

13. Later, LinkedIn sends Fluidinfo the query has tim/friend-request/linkedin. The search matches our object and LinkedIn then gets a list of its tags. For tags named */accepted (where * matches any user name) it tries to read the value of the tag. If it finds a tag whose value matches the identifier in the request tag on the object, LinkedIn adds the friendship link inside its own site. It also deletes the tim/friend-request/linkedin tag from the object, resulting in:


At this point we’re basically done. There are many possible variations on the above. For example, using timestamps to retry friend resolution, using timestamps to only examine recent requests, using separate tags to hold pieces of friend request information, using an extra linkedin.com tag to reduce the number of queries it must make to find outstanding requests, etc. Applications are not forced to use an examined tag to avoid repeating work; if they do they can name it whatever they please. It’s also easy to imagine more exotic resolvers, e.g., Tim giving people a secret random number and looking for that in the request (LinkedIn would have to allow Alice to add it to the request, obviously), etc. Participating applications could also clean up by finding objects with their examined tag but that no longer have a tim/friend-request/linkedin tag, and removing their examined and accepted tags (if present).

Why this is nice

The above dance is nice for several reasons:

  • This is an open, convention-based, extensible, and validated application ecosystem. LinkedIn just writes data to Fluidinfo and periodically checks for resolution. In effect it is giving Tim the power to use any application he wants to resolve the request in any way he wants. Participating applications just follow the established tag naming convention. LinkedIn knows that if any application attaches a tag of the form */accepted with value 9871261721498793 to the object, that it must represent a validated acceptance by Tim.
  • As a result, anyone can play. An idiosyncratic resolver such as that based on the O’Reilly org chart is as legitimate a contributor as any other. None of the resolvers needed to ask for permission (from LinkedIn) to participate, or needed to be anticipated by LinkedIn.
  • There are no API calls between the applications involved. This is significant because APIs have to be designed in advance and you need permission to use them. In our scenario, all communication between applications is done via a data protocol: adding to and retrieving from shared storage.
  • LinkedIn is free to ignore the */accepted tags from any application if it chooses.
  • Tim can withdraw permission for a resolver to work on his behalf, simply by taking away that application’s permission to read tim/friend-request/linkedin tags.
  • Tim can stop LinkedIn from creating friend resolution tags by removing the linkedin.com user’s permission to add the tim/friend-request/linkedin tag to objects. That would be somewhat extreme, seeing as LinkedIn is likely to offer its users a simpler way for people to turn off the feature, but it’s worth pointing out that Tim has control.
  • To those who don’t have permission to read tim/friend-request/linkedin tags, there is no way to see who has made the friend request. (The fact that Tim is the target could also be easily obscured, if wanted.)
  • All communication is convention-based and asynchronous. This resembles the way we (and other organisms), often communicate in natural systems. I suspect most information communication between living organisms is asynchronous, though I have no way to quantify this. Asynchronous communication via conventions in shared storage (e.g., those seen in Twitter with hashtags and @addressing) is so powerful because it is open-ended and evolutionary. Fit conventions (in the biological sense) will flourish. Conventions can be extended by any player, without harm. I wrote more on this in Dancing out of time: Thoughts on asynchronous communication.

Note that the above is just an example of how applications can communicate indirectly and asynchronously through shared storage using evolving conventions instead of using direct, synchronous, predefined API calls between one another. We have seen a solution to a difficult address book problem that has not involved writing an address book application. Instead, the problem is solved by a set of lightweight and loosely coupled cooperating applications communicating through data. I have (very slowly!) come to realize that this form of inter-application communication is an important part of what Fluidinfo makes possible. This is all enabled by the simple move to shared writable storage, coupled with a flexible permissions model and a query language.

Thanks for reading. I really hope you’ll find this as interesting as I do. Thanks to Nicholas Radcliffe, Tim O’Reilly, and Bar Shirtcliff for comments that greatly improved the above.

Indicating (shared) interest in things without disclosing what they are

Saturday, March 5th, 2011

Imagine you want wanted to tell the world you were interested in something, for example an email address or a phone number, without telling the world what that thing was. That may not sound so interesting, but if several people were doing the same thing, it would be a mechanism for discovery of private things you had in common, without telling anyone else what those things were.

Russell Manley and I just thought of a simple way to do this using Fluidinfo. Here’s how we did it for the email addresses we know.

For each email address, compute its MD5 sum. Then, put a rustlem/knows or terrycojones/knows tag onto the object whose fluiddb/about value is the MD5 sum. The MD5 algorithm is essentially one-way, so even if someone finds a Fluidinfo object with either of our tags on it (which is trivial) they cannot recover the original email address.

This is pretty nice. We’re independently indicating things of interest, but neither of us is publicly saying what those things are. Because we’re putting our information onto the same objects in Fluidinfo, we can then easily discover things we have in common with each other (and with others), without the world knowing what. We can do the same thing for phone numbers, or anything else.

Getting the data into Fluidinfo was trivial. Here’s code I used to put a terrycojones/knows tag (with value True) onto the appropriate objects:

import sys, hashlib
from fom.session import Fluid

fdb = Fluid()
fdb.login('terrycojones', 'PASSWORD')

for thing in sys.stdin.readlines():
    about = hashlib.md5(thing[:-1]).hexdigest()
    fdb.about[about]['terrycojones/knows'].put(True)
 

You pass a list of email addresses to this script on standard input.

Russell and I each had about a thousand email addresses in our address books. A first question is how many addresses we know in common. You can get the answer to this with the simple Fluidinfo query has terrycojones/knows and has rustlem/knows. It turns out there are 53 common addresses. But the results don’t tell us which addresses those are, which is also interesting.

We also wrote a small script to print any tags ending in /knows for a set of email addresses given on the command line.

import sys, hashlib
from fom.session import Fluid
from fom.errors import Fluid404Error
fdb = Fluid()

for thing in sys.argv[1:]:
    about = hashlib.md5(thing).hexdigest()
    print thing, about
    try:
        for tag in fdb.about[about].get().value['tagPaths']:
            if tag.endswith('/knows'):
                print '\t', tag
    except Fluid404Error:
        print '\tunknown'
 

So given an email address, we can run the above and see who else knows (or claims to) that email address.

We find all this quite thought provoking. Without going into details of the social side of this, it’s worth pointing out that Fluidinfo makes this kind of information sharing very easy because it has a guaranteed writable object for everything, including all MD5 sums. Because the fluiddb/about tag is unique and isn’t owned by anyone, any user can add their knows tag to the object for any MD5 sum. The ability for users and applications to work independently and yet to share information by just following a fluiddb/about convention is one of the coolest things about Fluidinfo.

Finally, note that this system does not guarantee privacy. If someone already knows an email address or phone number (etc) they can compute its MD5 sum and examine the Fluidinfo tags on the corresponding object. Doing so they might see a rustlem/knows tag and would then be free to draw their own conclusion.

You can play too. All you need is a Fluidinfo account and the above code. Please let us know how you get on. For example, you can freely tweet any MD5 sums we have in common. We’re going to use the hashtag #incommon, like this.

Putting domain names onto data with Fluidinfo

Wednesday, February 23rd, 2011

Internet domain names can be thought of as a mechanism for attaching trust and reputation to digital information. We do this in two major ways: (1) by using domain names in the URLs of web pages, and (2) by putting them in the sender’s “From” address of email messages.

To give a concrete example, suppose you see some shoes for sale on a web page. If you look at the page URL and see the amazon.com or zappos.com domain name, trust and reputation knowledge springs instantly to mind. You know the quality is probably good, the price competitive, and that if the shoes are lost in shipping you’ll be sent another pair for no charge. On the other hand, if you see ebay.com in the URL, a different matrix of trust and reputation knowledge will spring to mind. A similar thing happens if you get email from someone you’ve never met. If you see stanford.edu or forbes.com in the email “From” line, reputation information springs to mind.

Looked at in this way, domain names are small tokens that we send alongside other pieces of content such as web pages and emails. The domain name carries vital trust and reputation information. Recognition and trust in domain names is globally distributed, spread variously through the brains of most of the people on the planet, with its integrity guaranteed by DNS. Domain names make the internet useful. Without them, digital information online would be almost useless as we could not confidently trust any 3rd party data.

Question: given that we can attach domain names to web pages and email messages, can we find a way to attach them to other things?

Domain names on data

We’re excited to announce that Fluidinfo now makes it possible to put domain names onto individual pieces of data.

To illustrate, the image on the right shows a fanciful example book object in Fluidinfo (large version). The tag names on the object are colored. You’ll see that some of them contain domain names: amazon.com/price, barnesandnoble.com/price and vintage.com/epub. Tags in Fluidinfo can have values, as illustrated by the amazon.com/price tag whose value for this book is $19.

The combination of a Fluidinfo tag name containing a domain and an associated tag value is exactly like a URL containing a domain name and an associated HTML value (i.e., a web page) or an email message with a domain name in its From line.

Because Fluidinfo objects don’t have owners (their tags do, though), any number of domain owners are free to put their information, branded with their domain name, onto any Fluidinfo object.

A killer combination: writable APIs with domain-branded data

Fluidinfo automatically provides a writable API for all its data. By allowing for domain names on data, domain holders who want to publish information about their products can now do so with an API that has three major advantages:

  • Your data is branded with your domain name.
  • Your data lives in a writable ecology of related data, collecting on the same Fluidinfo objects. This allows for search across data from different users and domains, put there by different applications. It allows for additional data of all kinds, for mashups, and for customization, personalization, and filtering.
  • Fluidinfo has a flexible permissions system at the level of its tags, so you maintain full control of your own data. You can make it public or private, or can allow or disallow access for specific others.

Because Fluidinfo objects are fine grained, composed simply of tags with values as in the image above, applications can fetch, search on, or combine specific pieces (or combinations) of data provided by different trusted sources with single requests. There is a general principle here: information becomes more useful and valuable when it is stored in context. This is illustrated vividly by Google, which collects web pages into one place to enable search, and by Wikipedia, which allows people to pool related information. Although these examples have very different models of trust and reputation, they both illustrate the underlying principle.

Getting your domain name in Fluidinfo

To start using your domain in Fluidinfo, first sign up, using your domain name as your user name. Our sign-up system will recognize that the username is a domain and will send you an email telling you how to prove that you control the domain. Once that’s done, you can begin using Fluidinfo to upload information branded with your domain and to provide an API for others (or for your own company) to find your products or otherwise use the information you make available.

In other words, all Fluidinfo usernames that correspond to actual internet domains are automatically reserved for their owners. Besides preventing a chaotic land grab, this is how we can guarantee to people seeing information in Fluidinfo that the value of a Fluidinfo tag whose name includes a domain name can be trusted exactly as it would be if that domain appeared in a web page URL or email From address.

So there you have it… domain names on data. We’re very excited to see where this will lead and we’re actively building out some writable APIs with domain-branded data. You can too. Claim your domain name in Fluidinfo right now.

How I made a writable API for Union Square Ventures in an hour

Tuesday, February 15th, 2011

Image: Eric Archivell

I was mailing Fred Wilson and Albert Wenger of Union Square Ventures late last year, talking about Fred’s article Giving every person a voice. Fred said

I hadn’t really thought that we are all about shrinking the minimal viable publishing object, but that may well be true in hindsight.

I wanted to illustrate Fluidinfo as doing both: providing a minimal viable way to publish data (with an API), and also giving everyone a voice. So I decided to build Union Square Ventures a minimal API, and to then add my voice. In an hour.

A minimal viable API for USV

USV currently has 30 investments. If you want to get a list of the 30 company URLs, how would you do it? A non-programmer would have no choice but to go to the USV portfolio page, and click on each company in turn, then right-click on the link to each company’s home page and copy the link address, and then add that URL to your list. That process is boring and error prone.

If you’re a programmer though, you’d find this ridiculously manual. You’d much rather do that in one command, for example if you’re collecting information on VC company portfolios, perhaps for research or to get funded. Or if you were building an application, perhaps to do what Jason Calacanis is doing as part of the collecting who’s funding whom on Twitter and Facebook. You want your application to be able to fetch the list of USV company URLs in one simple call.

So I made a unionsquareventures.com user in Fluidinfo (sign up here), did the repetitive but one-time work of getting their portfolio companies’ URLs out of their HTML (so you wouldn’t have to), and added it to Fluidinfo. I put a unionsquareventures.com/portfolio tag onto the Fluidinfo object about each of those URLs. In other words, because Fluidinfo has an object for everything (including all URLs), I asked it to tag that object.

That was just 7 lines of code using the elegant and simple Python FOM library for Fluidinfo written by Ali Afshar:

import sys
from fom.session import Fluid

fdb = Fluid()
fdb.login('unionsquareventures.com', 'password')
urls = [i[:-1] for i in sys.stdin.readlines()] # Read portfolio URLs from stdin

for url in urls:
    fdb.about[url]['unionsquareventures.com/portfolio'].put(True)
 

As a result, using the jsongrep script I wrote to get neater output from JSON, I can now use curl and the Fluidinfo /values method to get the list of USV portfolio companies in the blink of an eye:

curl ‘http://fluiddb.fluidinfo.com/values?query=has%20unionsquareventures.com/portfolio&tag=fluiddb/about’ |
jsongrep.py results . . fluiddb/about value | sort
u’http://amee.cc’
u’http://getglue.com’
u’http://stackoverflow.com’
u’http://tumblr.com’
u’http://www.10gen.com’
u’http://www.boxee.tv’
u’http://www.buglabs.net’
u’http://www.clickable.com’
u’http://www.cv.im’
u’http://www.disqus.com’
u’http://www.edmodo.com’
u’http://www.etsy.com’
u’http://www.flurry.com’
u’http://www.foursquare.com’
u’http://www.hashable.com’
u’http://www.heyzap.com’
u’http://www.indeed.com’
u’http://www.meetup.com’
u’http://www.oddcast.com’
u’http://www.outside.in’
u’http://www.returnpath.net’
u’http://www.shapeways.com’
u’http://www.simulmedia.com’
u’http://www.soundcloud.com’
u’http://www.targetspot.com’
u’http://www.twilio.com’
u’http://www.twitter.com’
u’http://www.workmarket.com’
u’http://www.zemanta.com’
u’http://zynga.com’

There you have it, a sorted list of all Union Square Ventures portfolio companies’ URLs, from the command line. I can do it, you can do it, and any application can do it.

The jsongrep.py program can also be used to pull out selective pieces of the output. For example, which of the companies have “ee” in their URL?

curl ‘http://fluiddb.fluidinfo.com/values?query=has%20unionsquareventures.com/portfolio&tag=fluiddb/about’ |
jsongrep.py results . . fluiddb/about value ‘.*ee’ | sort
u’http://www.meetup.com’
u’http://amee.cc’
u’http://www.indeed.com’
u’http://www.boxee.tv’

So maybe, in order to be funded by USV, it helps to have “ee” in your URL? :-)

What about USV companies that don’t have “.com” URLs?

curl ‘http://fluiddb.fluidinfo.com/values?query=has%20unionsquareventures.com/portfolio&tag=fluiddb/about’ |
jsongrep.py results . . fluiddb/about value ‘.*(?<!\.com)$’
u’http://www.outside.in’
u’http://amee.cc’
u’http://www.cv.im’
u’http://www.buglabs.net’
u’http://www.returnpath.net’
u’http://www.boxee.tv’

OK, these things are geeky, but that’s part of the point of an API: to enable applications to do things. We’ve made the portfolio available programmatically, and you can immediately see how to do fun things with it that you couldn’t easily do before. In fact, it’s quite a bit more interesting than that. As a result of doing this work, I can tell you that there was a company listed a couple of months ago on the portfolio page that is no longer there. And there’s a company that’s been invested in that’s not yet listed. That’s a different subject, but it does illustrate the power of doing things programmatically.

This is a minimal viable API for USV because there’s only one piece of information being made available (so far). But an API it is, and it’s already useful.

It’s also writable.

Giving everyone a voice

In a sense we’ve just seen that everyone has a voice. USV put a tag onto the Fluidinfo objects that correspond to the URLs of their portfolio companies and they didn’t have to ask permission to do so.

But what about me? I’m a person too. I’ve met the founders of some of those companies, so I’m going to put a terrycojones/met-a-founder-of tag onto the same objects. Fluidinfo lets me do that because its objects don’t have owners, its permission system is instead based at the level of the tags on the objects.

So I wrote another 7 line program, like the one above, and added those tags. I also added another USV tag, called unionsquareventures.com/company-name. Let’s pull back just the names of the companies whose founders I’ve met:

curl ‘http://fluiddb.fluidinfo.com/values?query=has%20unionsquareventures.com/portfolio%20and%20has%20terrycojones/met-a-founder-of&tag=unionsquareventures.com’/company-name |
jsongrep.py results . . . value | sort
u’Bug Labs’
u’Foursquare’
u’GetGlue’
u’Meetup’
u’Shapeways’
u’Stack Overflow’
u’Tumblr’
u’Twitter’
u’Zemanta’

Isn’t that cool? I do indeed have a voice!

You have one too. If you sign up for a Fluidinfo account you can add your own tags and values to anything in Fludinfo. And you can use Fluidinfo, just as I’ve illustrated above, to make your own writable API. See also: our post from yesterday, What is a writable API?

What is a writable API?

Monday, February 14th, 2011

When we released the Fluidinfo API for Boing Boing two weeks ago, Simon Willison noted on his blog:

“Fluidinfo really is a fascinating piece of software.” …. “Writable APIs are much less common than read-only APIs – Fluidinfo instantly provides both.”

If you search online to try to discover what people mean by a “writable API”, it’s hard to find anything that merits the name. So what did Simon mean? What is a writable API?

Both Simon and the team at Fluidinfo think “writable API” should be a kind of shorthand for an API that provides access to underlying data that is writable. This is not meant in the trivial already-possible sense wherein you pass data to an API method that stores them into a database you can’t otherwise access. We mean it in a more fundamental sense: that the underlying data is writable. That anyone or any application can directly access the data storage layer and add new information to it – without the knowledge of the people who stored the original data. That sounds pretty radical. But if you have a model of control in which objects are not owned but their pieces are, it’s not scary at all. In fact it’s liberating.

And, you guessed it, Fluidinfo has exactly that model of control. Any information stored into Fluidinfo instantly has a writable API in the sense just described. Let’s see a concrete example from the recent Boing Boing data imported into Fluidinfo.

Below is an illustration of an object in Fluidinfo, showing a subset of the tags that are on every Fluidinfo object representing a Boing Boing article. (The image was generated using Nick Radcliffe‘s fun About Tag image generator for Fluidinfo objects. Click the image to see the all its tags.)

An object

Simply by virtue of being stored in Fluidinfo, Boing Boing instantly got an API for all their articles. The API lets you find Boing Boing articles, as represented by objects in Fluidinfo, via querying on tags such as those shown on the object above. For example, you can use the API to find Boing Boing articles published in December 2008 that were written by Cory Doctorow. Or you can get a list of all the Boing Boing articles that contain a reference to the domain www.whitehouse.gov. (You can see details of these sorts of queries in our article on Mining the Boing Boing API.)

Those kinds of searches on Boing Boing data were not previously possible. We put the whole thing together in a single evening, which illustrates how simple it can be to make a Fluidinfo-fueled API for your own information. As cool as these examples are, though, they’re just reading & searching Boing Boing controlled data, as with a traditional API. What about writing?

Writing the Boing Boing data – without stopping to ask permission

The tags on the object above were put there by the Fluidinfo user named boingboing.net. That user controls those tags, and has given the rest of us read permission. But no-one owns the Fluidinfo object that the tags are on. As a result, anyone with a Fluidinfo account (sign up here) can add any information to the exact same object.

To give a very simple example, suppose someone wrote a simple browser extension (or extensions) that let Boing Boing readers mark stories as being funny or not suitable for work. Two users, Alice and Bob, might then put alice/funny and bob/nsfw tags onto the above object. Assuming I had read permission on those tags, I could then find Boing Boing articles by Cory Doctorow that Alice enjoyed and Bob found too risqué for work. Someone else could write a browser extension that popped up a warning about NSFW content based on Bob’s tag. In fact, take a proper look at the object above, you’ll see that I have added a terrycojones/nsfw tag to it (terrycojones is my username in Fluidinfo).

That’s customization and personalization – in our hands. It’s adding data to the exact same objects that Boing Boing created, combining their data and ours as we please, and all without stopping to ask permission or requiring that a database administrator or programmer anticipate our idiosyncratic needs. Boing Boing and any applications they create, may not be aware of, care about, or even be able to detect the new data (depending on permissions).

In other words, we can say that Boing Boing has a writable API, because other people and other applications are always free to add information to the same objects that the Boing Boing API is providing access to. The same applies to any application or API that uses Fluidinfo. A writable API opens the door onto a very different world, allowing unlimited possibilities for mash-ups, new applications, extensions, widgets, etc. It allows arbitrary customization and personalization. Fluidinfo acts like a universal metadata engine, providing guaranteed write access to anything, with a permissions system at the level of the tag, not the object.

We’ll give another example of a simple but fun writable API tomorrow. Next week we’ll release a much more substantial one at the LAUNCH conference in San Francisco. We’re really excited about it, and have a series of not-to-be-missed upcoming blog posts on what we’ve been up to.

Stay tuned!

What the Post-It Note Can Teach Us About Apps and Data

Thursday, February 10th, 2011

On Feb 9th I gave a talk the the NYC Ignite event, titled “What the Post-It Note Can Teach Us About Apps and Data.” Below are my 20 slides (21 if you could image credits). These were advanced automatically every 15 seconds during the 5-minute talk. I’ll post a link to the video once it’s up.

While the topic may not seem to have anything to do with Fluidinfo, there is a very close connection. I’ll write about that another time.