Getting Started With Fluidinfo is now out!

March 6th, 2012 by Terry Jones

Getting Started With Fluidinfo is now available in hardcopy and various eBook formats from O’Reilly. The authors, Nicholas Radcliffe (@njr) and Nicholas Tollervey (@ntoll), know Fluidinfo inside out, as you might hope. They’ve written multiple Fluidinfo client libraries, web applications, command line tools, visualizations, have written many blog posts about Fluidinfo, have imported tons of data into the system, and have both contributed to the design and architecture in many ways. The books is extremely well written. Both of the Nicholases are entertaining and clear writers.

The first chapter has a wonderful introduction and overview of Fluidinfo, and should be understandable by a broad audience. After that, things get more technical with a chapter on using Fish, a Fluidinfo shell, either from your shell command line or via Shell-fish, a web interface. Playing directly with Fluidinfo, adding to objects and running queries, is probably the best way to understand its (very simple!) data model. Then it’s on to programmatic access, using two Python libraries (one low level, one high level) and via Javascript. An example social book reader application is built from the ground up in Javascript. The book concludes with chapters on the REST API, advanced use of Fish, discussion of the special Fluidinfo about tag, and a description of the query language.

We hope you’ll grab a copy and come join us either on the fluidinfo-users or fluidinfo-discuss mailing lists, or on the #fluidinfo IRC channel on Freenode (chat right now with a web based client). We’ll be happy to say hi and to help get you going.

Import your tweets into Fluidinfo

February 6th, 2012 by Terry Jones

If you import your tweets into Fluidinfo, you can use our new web interface to see interesting information about all the @names, #hashtags and URLs you’ve mentioned on Twitter. To get going, just go to Fluidinfo and log in with Twitter (top right).

Fluidinfo will fetch your past tweets from Twitter and will examine them for all the @names, #hashtags, and URLs you’ve ever mentioned. That will probably take some minutes to complete (reload the page to check the import status).

For each of your tweets, we extract the @names, #hashtags, and URLs it mentions and we add to the Fluidinfo page for each of them. To make this more concrete, here are examples from people who have already imported their tweets into Fluidinfo:

Example: the #occupy hashtag Here’s the Fluidinfo page for #occupy. Click the links on the left of that page to explore different views of #occupy information in Fluidinfo and across the web. For example, you can see mentions of #occupy by Paul Kedrosky, mentions of #occupy by Tim O’Reilly, and mentions of #occupy by Ethan Zuckerman. On the right is a screenshot that shows some of how #occupy appears to me when I look at my friends’ mentions of it (click to see a larger version). Along with the page for #occupy, Fluidinfo has a page for every hashtag.

Example: a URL In December 2011, Fred Wilson blogged about Freedom To Innovate. Fluidinfo has a page for the URL of Fred’s post. Click the links on the left of that page to explore. Later, Tim O’Reilly tweeted about the post. Tim has imported his tweets into Fluidinfo, so you can see his "mentioned" tag on the URL of Fred’s post. Brad Feld, who has also imported his tweets, mentioned Fred’s post as well. On the right is a screenshot that shows some of how Fred’s post appears to me when I look at my friends’ mentions of it (click to see a larger version). Along with the page for Fred’s article, Fluidinfo has a page for every URL.

Example: @sarawinge Esther Dyson mentioned (via retweeting) Sara Winge, so on the @sarawinge page in Fluidinfo you should expect to see Esther’s mentions of Sara. If you explore the views on the left of that page, you’ll also see mentions of Sara by @marcprecipice, @pkedrosky, @timoreilly, and others. Along with the page for @sarawinge, Fluidinfo has a page for every @username.

Example: scientific american Just as it does for @names, Fluidinfo tags the user’s name (as given on Twitter). So because Joi Ito has mentioned @sciam (the Twitter account for Scientific American), the "scientific american" page in Fluidinfo has Joi’s mentions of @sciam. The views on the left of that page show mentions of Scientific American by others, as well as lots of other information from across the web. And yes, you guessed it, along with the page for "scientific american", Fluidinfo has a page for every name.

Add to these pages yourself! If you import your tweets, your mentions of @names, hashtags and URLs will also be added to pages in Fluidinfo. But don’t forget that you can also add your own tags to any page at all. After you log in, click the green Tag button to add something. Enter a tag name (e.g., comment) and a value (the text of your comment) and click Submit.

Next up… search In a follow-on post, I’ll show you how to use the secret terminal built into the Fluidinfo. The terminal lets you search your tweets, find things different people have mentioned in common, and much more besides.

Next-generation tagging

January 4th, 2012 by Terry Jones

Image: rosweed

In 2003, Joshua Schachter released Delicious, and brought tagging to the web. Delicious gave normal users a place to store and share their information about any web page. By January 2007, a Pew Internet & American Life Project study had found that 28% of online Americans had used tagging. As offered by Delicious, and popularized by sites such as Flickr and Technorati, basic tagging has three fundamental properties:

  1. You can tag any URL.
  2. You don’t have to ask permission.
  3. Shared public tagging creates social value

Most fundamentally, tagging made the web more writeable. This changed our perception of the web itself, from something that was almost entirely read-only to something where we were suddenly invited to contribute. The change was a cornerstone of what Tim O’Reilly dubbed “Web 2.0”. It’s what Fred Wilson describes as “giving every person a voice“. Fred quotes Ev Williams, a Twitter founder, who says “society has not fully realized what this means”. One view of Twitter is that it offers a form of simple self-tagging. Viewing tagging as low-friction publication of small pieces of information about things, Twitter has done for people what Delicious did for URLs.

Tagging with Fluidinfo

The model of information in Fluidinfo takes tagging to a new level. Over the last three months we’ve been hard at work building a user interface (UI) to make this possible. The two main additional properties of tagging with Fluidinfo are:

  1. You can tag anything at all, not just URLs.
  2. Tags can optionally be given values.

Tagging is too powerful an idea to be restricted to just URLs or people. It should be possible to tag anything. And, as you’ll see below, tags with values are powerful and natural. We tag with values in the non-digital world all the time.

Tag anything at all

Every day each of us encounters thousands of things that are not URLs. With Fluidinfo you can tag anything you can name. You can tag songs, movies, and books – just put the name into the box at the top of the Fluidinfo page, hit RETURN, and you’ll be looking at the page for that thing in Fluidinfo. If you’re logged in, you can add tags. Look at what pops up on the left to see other information about the object. You can also tag a product name, an email address, a person’s name, an IP address, a license plate number, a place name, flight numbers, a word or phrase, a zip code, a stock symbol, a latitude/longitude pair, a Twitter @name or #hashtag, taxi medallion numbers, time/date combinations, phone numbers, a DNA sequence, a chess position or a Sudoku puzzle, etc. To get to the Fluidinfo object for something, just visit The easiest way to jump to the Fluidinfo page for something you see on the web is via a browser extension (Chrome, Firefox, Safari). Just right-click on a link, an image, on selected text, or on the page itself, and jump straight to that thing in Fluidinfo.

Tag with values

With Fluidinfo, you can provide a value when tagging. Tagging with values will seem new to many people, but it’s actually not new at all. Look at the luggage tag image above. It has (implicitly) a ‘destination’ tag with a value of ‘JFK’, it has a tag named ‘Flight number’ with a value of 222, a tag named ‘Row’ with a value of 3, a ‘Sorting Symbol’ with value ‘B’ and something else whose value is 821474. Those tag names and values are all put on the same physical tag because they’re related, and it makes sense that they travel together, attached to the luggage. So, far from being new, tagging with values simply provides us with something familiar which has always existed in the physical world of tagging.

To give some examples in Fluidinfo, I have put a ‘rating’ tag onto Gone With the Wind with a value of 4. A tag value can be anything at all: a comment, a set of keywords, ‘longitude’ and ‘latitude’ tags with numeric values, even arbitrary data (e.g., some HTML, an image, a PDF file).

Most interesting, a tag value can be a URL (or list of URLs). You might tag an image with a value that is the URL of another image. You might put a ‘for-dad’ tag onto something with a value pointing to a video. In the Fluidinfo UI, tag values that are URLs are shown as simplified embedded pages or images.

Examples of things in Fluidinfo

Here are links to a variety of things in Fluidinfo. Take a look at the list of links on the left of each page. These are ‘views’ of the data, either from different sources or displayed in different ways. If you log in, you’ll be able to add your own tags.

Improvements to namespaces, tags and permissions

August 18th, 2011 by jkakar

Namespaces and tags provide a powerful mechanism for organizing information.  The Fluidinfo API provides a set of tools for creating, describing and using them to store information about anything and everything. Until now, you had to create them before you could use them to store values, but we’ve changed that. Namespaces and tags are now created automatically, on first use, provided you have permission to do so. A number of API calls that had to be made in the past are no longer necessary, which makes storing data easier and faster than before.

Permissions provide fine-grained privacy controls to define who has access to see and work with information in Fluidinfo. By default, Fluidinfo creates permissions that grant everyone read access to information, while limiting write access to the author of the information. This is good for the most part, as Fluidinfo and its users benefit from sharing information with each other. The default behaviour could cause surprising results when used with a namespace that had been locked down and made private though, because the new namespaces and tags would be public.  This is no longer the case. Permissions for new namespaces and tags now inherit from their parent namespace, at creation time. Changing permissions for existing namespaces and tags won’t cause any changes to propagate to children.

Namespace permissions are inherited one-to-one. That is, the create namespace permission is copied to a new child namespace, the update namespace permission is copied to a new child namespace, and so on. Tags are a little bit different because they have a different set of permissions than namespaces. The update tag, delete tag, write tag value and delete tag value permissions are all inherited from the create namespace permission on the parent namespace. The read tag value permission is inherited from the list namespaces permission on the parent namespace. Control permissions are inherited from the namespace’s control permissions.

The combination of automatic namespaces and tags with inherited permissions makes Fluidinfo both easier and safer to use. We hope you enjoy these changes!

A shared writable object for everything: Sudoku problems

June 15th, 2011 by Terry Jones

Image: wikipedia

Fluidinfo is composed of objects that have tags with values. One of the tags is special, the about tag (its full name is actually fluiddb/about). One of its properties is that its values are unique across Fluidinfo. In other words, there can only be one object with any particular about value. It’s a bit like Wikipedia, which you can think of as having a single shared writable page for every topic imaginable. Fluidinfo offers a shared writable object for everything imaginable. The last part of a Wikipedia URL is like a Fluidinfo about tag value. That’s very wiki-like, but other properties of Fluidinfo make it more useful for applications than a wiki is. For example, it has a permissions system, its data is typed, and it has a query language. If you’d like to learn more about the about tag, there’s an entire blog named after it, run by Nicholas Radcliffe.

This morning I was having breakfast with Esteve Fernández and Jamu Kakar and Jamu mentioned a friend who’s heavily into Sudoku. I mentioned that there are several mobile apps that let you take a picture of a Sudoku puzzle and then solve it for you. I also said “Fluidinfo has an object for every Sudoku puzzle”.

I thought I’d write a very short blog post to illustrate this. So I took the Sudoku puzzle from the Wikipedia entry and made a Fluidinfo object for it. I read the cells of the puzzle left-to-right, top-to-bottom, and used a hyphen to indicate an empty cell in the starting configuration. Written as a single string of characters, that’s 53--7----6--195----98----6-8---6---34--8-3--17---2---6-6----28----419--5----8--79. I used the Fluidinfo Explorer to create an object in Fluidinfo with that as the about value. Then I put a tag called terrycojones/solution onto that object, with value 534678912672195348198342567859761423426853791713924856961537284287419635345286179, which is the correct solution to the puzzle, again read left-to-right, top-to-bottom.

This illustrates a few things. Firstly, Fluidinfo really does have an object for all Sudoku puzzles (created as needed, of course). Second, I’ve established a convention for the about tag value to represent those things. I could have done this in many different ways, and the solution I chose may not be the best. If I were intending to add information about lots of Sudoku puzzles, I would publish my choice and encourage others to follow it (which anyone could do, since any Fluidinfo user may create an object with any about tag – if the object already exists it’s no problem, you just get the already existing object). Third, the terrycojones/solution tag I put onto the object may not be of much use to the wider world. But, I could give other people (or applications) that I trust write permission for that tag so they could tag the objects too. Fourth, if I thought those solutions were worth something, I could make the tag unreadable by default and try to sell access to it (i.e., allow only those who paid me to have read access).

Finally, this illustrates the simple way in which isolated activities, like individually solving a puzzle, can be made social through shared writable storage. If I used a Sudoku application that tagged the shared object with some subset of terrycojones/solved or terrycojones/working-on-it or terrycojones/too-hard or terrycojones/solution-time tags, and others used that application too, solving Sudoku puzzles would instantly be social. Any Sudoku application could use Fluidinfo to allow you to see puzzles your friends couldn’t do or were working on, it could show you the amount of time your friends took, give you hints, find errors, etc. It’s easy to think of a ton of possible social extensions to the Sudoku world, and these include collaborative efforts.

This is a nice example of how shared writable storage with an object for everything allows formerly isolated actions to easily be made social. I’m planning to write up some more simple examples. There are many of them.

Personalized filtering of friend requests in social networks

June 1st, 2011 by Terry Jones

In an O’Reilly Radar article titled Getting closer to the web2.0 address book in October 2010, I described how a set of applications might coordinate via shared storage to solve Tim O’Reilly‘s question:

Given that knowledge about who I know is in my phone, in the O’Reilly org chart, and in the set of authors of O’Reilly books, why do I still have to manually approve friend requests from Facebook, LinkedIn, etc.?

In that article I argued that this aspect of what Tim calls the web 2.0 address book should be solved not by an application, or even by applications making API calls to one another, but by a set of applications that communicate asynchronously via shared writable storage. As well as shared storage, also needed are a permissions system, a query language, and conventions used by the cooperating applications. No API calls are needed between applications. The interaction between the cooperating applications takes place entirely via shared storage. API calls are only used to add, retrieve, and query data in the shared storage.

At Gluecon in Denver last week, I gave a talk titled The Real Promise of Cloud Storage in which I argued that shared storage gives two big advantages. The second was that it allows the evolution of “asynchronous data protocols”. The final slides of the presentation showed the series of steps that would be taken to build what Tim was asking for. I wasn’t able to go into the detail of how this would work in the time available. So in this post I’ll give the detail of how this can be implemented using Fluidinfo. The post is long because I want to spell out the steps, the assumptions, the Fluidinfo tag permissions, the queries, the security aspects, and so on.

The business case

I’m going to pretend LinkedIn is the company that would like its users to have more flexibility in approving friend requests. Additional flexibility could include personalized friend resolvers (for example based on the O’Reilly org chart) and also applications that could assist in various way, and if desired even automate some responses (e.g., people you have called more than twice in the last month, or people you have put on a particular list on Twitter).

LinkedIn is a good candidate because they have traditionally placed less emphasis on being a burgeoning social network, and more on link quality, by requiring a higher level of proof to create more meaningful reciprocal connections. That seems to be changing slowly now, though. You used to need to know another user’s email to connect to them, but that has been relaxed over the last couple of years. It’s obvious why LinkedIn would do this: knowing connections between their members is valuable. LinkedIn will now show me suggested people I might know, and let me reach out to connect with them with a single click. No need to go look up email addresses, etc.

I suspect many people have valid unanswered friend connection requests sitting in their LinkedIn inboxes. I do, and I almost never make time to deal with them. Those requests, if accepted, would add value to LinkedIn. The (valid) connection requests that sit unanswered for whatever reason represent unrealized value. So it’s very much in LinkedIn’s interest to help people connect. They should be interested in a solution to Tim’s dilemma, as should Facebook and any social network whose friending mechanism is reciprocal.

Using the shared writable storage of Fluidinfo to solve this problem

Here is how this could be achieved with Fluidinfo.

1. LinkedIn announces that it will support personalized friend resolution via Fluidinfo. Users will have control over what kinds of friend resolution they want to enable. LinkedIn will use the user name in Fluidinfo. (Fluidinfo makes it possible to put domain names on data. Every Fluidinfo user name that corresponds to a domain is reserved for its internet domain owner.)

2. Suppose Tim, a LinkedIn user, decides to try the new functionality. LinkedIn directs him to a page where he can choose a Fluidinfo user name (let’s say "tim") and create a Fluidinfo tag called tim/friend-request/linkedin. Tim sets the permissions so that the Fluidinfo user can both add and delete the tag on Fluidinfo objects. He then tells LinkedIn his Fluidinfo user name. This is necessary because LinkedIn will later create data in Fluidinfo on Tim’s behalf. Note that LinkedIn only knows Tim’s Fluidinfo user name, not his password.

3. Suppose a user Alice now tells LinkedIn that she is a friend of Tim’s. LinkedIn creates a new object in Fluidinfo and puts an instance of tim/friend-request/linkedin onto it with a value (perhaps in JSON format) holding the information Alice is willing to supply as evidence that she knows Tim. This could include things like her Twitter user name, her phone number, the name of a company she used to work at with Tim, etc. LinkedIn also adds a random request identifier (more on which below).

The object in Fluidinfo looks like this (the tag value is in the rectangle under the tag name):

Note that the object does not have an owner. That’s a fundamental feature of Fluidinfo: objects don’t have owners. Instead, tags on objects have permissions. That’s crucial in what follows, because it allows any application to add more information to the object representing the friend request.

A friend resolver based on Twitter friends: TwitResolve

4. Let’s suppose a keen-eyed developer notices the LinkedIn announcement and decides to write a Twitter-based friend resolver called TwitResolve. The developer grabs the internet domain and then gets the user name on Fluidinfo (as the domain owner, only they can do this, as above). TwitResolve will use two tags in Fluidinfo, and It sets the permissions on the tag so that only the Fluidinfo user called can read it.

5. Let’s suppose Tim hears of TwitResolve and decides to use it. The TwitResolve sign-up page asks Tim to change the permissions on the Fluidinfo tag tim/friend-request/linkedin so that the Fluidinfo user can read the tag. (The setting of the permission could be done in various ways, which we’ll ignore for now.) By giving TwitResolve permission to read the tag that LinkedIn puts onto objects, Tim enables TwitResolve’s participation in helping to resolve friend requests for him. TwitResolve also asks Tim for his Twitter username (and gets Tim to prove that).

6. TwitResolve runs a process periodically that asks Fluidinfo for the outstanding requests for Tim that it hasn’t already examined. The relevant query in Fluidinfo’s query language is has tim/friend-request/linkedin except has

7. In our case, this query finds the object shown above and TwitResolve examines the value of the tim/friend-request/linkedin tag. It talks to Twitter to discover whether Tim is following Alice (it can do this because Alice’s Twitter user name is in the tag value, and Tim provided his Twitter user name to TwitResolve). Suppose Tim is not following Alice. TwitResolve cannot conclude anything, so it simply puts a tag (with no value) onto the request object in order to avoid re-examining this request. At this point the Fluidinfo object looks as below and the friend request is still unfulfilled.

A friend resolver based on the O’Reilly org chart

8. News gets out quickly to the digerati and someone inside O’Reilly decides to write a resolver based on the O’Reilly org chart. They create two tags, similar to those created by TwitResolve, called and, again setting the permissions on the former so that only the Fluidinfo user called can read that tag. Tim decides to use the new resolver, so he gives the user permission to read the tim/friend-request/linkedin tag.

9. The O’Reilly resolver might be started by cron each night. When it runs, it sends a Fluidinfo query, just like TwitResolve does. Suppose Alice does not work at O’Reilly. The details in the request cannot confirm her as a friend of Tim’s. Just as with TwitResolve, the O’Reilly program puts an tag onto the object, and we arrive at the following:

A friend resolver based on Amazon books

10. To give a third example (to fulfill Tim’s requirements), another resolver could be written based on authors of books. Suppose it’s written by someone at Amazon. (The details of how this might successfully match Alice to Tim aren’t important here.) Suppose it is also unsuccessful in matching this friend request. It adds an tag to the object:

An iPhone friend resolver: iPhoneFriender

11. Next, someone decides to write a resolver that can match based on phone numbers, called iPhoneFriender. They get the domain name and user name in Fluidinfo. They will use the tags ( and, and set the permissions on the tag so that only the Fluidinfo user can read it.

When the iPhoneFriender application runs, it sends the Fluidinfo query has tim/friend-request/linkedin except has and finds our object. It reads the request details. Suppose it finds Alice’s phone number in Tim’s phone address book, and can see whether Tim has called Alice or vice-versa, whether Tim has chosen not to accept her calls, etc. iPhoneFriender has preference settings to allow Tim to specify what kinds of requests it will ask him to confirm, and which to automatically accept. Suppose that one way or another the friend request is accepted.

12. The iPhoneFriender application needs to tell LinkedIn that Alice is recognized as a friend of Tim’s. So it puts an tag with value 9871261721498793 onto the original object. This is the unique request_id value from the original request. Remember that the tag has its permissions set so that only can read it. Hiding the identifier in this way prevents rogue applications from falsely claiming to have satisfied friend requests. Only an application that Tim has given permission to read the tim/friend-request/linkedin tag to could know the request identifier. Only LinkedIn can read the identifier value out of the tag attached by iPhoneFriender. iPhoneFriender also adds an tag to the object to avoid repeating work.

The Fluidinfo object now looks as follows (tag values, when present, are shown in rectangles under the tag names):

Friendship, requited!

13. Later, LinkedIn sends Fluidinfo the query has tim/friend-request/linkedin. The search matches our object and LinkedIn then gets a list of its tags. For tags named */accepted (where * matches any user name) it tries to read the value of the tag. If it finds a tag whose value matches the identifier in the request tag on the object, LinkedIn adds the friendship link inside its own site. It also deletes the tim/friend-request/linkedin tag from the object, resulting in:

At this point we’re basically done. There are many possible variations on the above. For example, using timestamps to retry friend resolution, using timestamps to only examine recent requests, using separate tags to hold pieces of friend request information, using an extra tag to reduce the number of queries it must make to find outstanding requests, etc. Applications are not forced to use an examined tag to avoid repeating work; if they do they can name it whatever they please. It’s also easy to imagine more exotic resolvers, e.g., Tim giving people a secret random number and looking for that in the request (LinkedIn would have to allow Alice to add it to the request, obviously), etc. Participating applications could also clean up by finding objects with their examined tag but that no longer have a tim/friend-request/linkedin tag, and removing their examined and accepted tags (if present).

Why this is nice

The above dance is nice for several reasons:

  • This is an open, convention-based, extensible, and validated application ecosystem. LinkedIn just writes data to Fluidinfo and periodically checks for resolution. In effect it is giving Tim the power to use any application he wants to resolve the request in any way he wants. Participating applications just follow the established tag naming convention. LinkedIn knows that if any application attaches a tag of the form */accepted with value 9871261721498793 to the object, that it must represent a validated acceptance by Tim.
  • As a result, anyone can play. An idiosyncratic resolver such as that based on the O’Reilly org chart is as legitimate a contributor as any other. None of the resolvers needed to ask for permission (from LinkedIn) to participate, or needed to be anticipated by LinkedIn.
  • There are no API calls between the applications involved. This is significant because APIs have to be designed in advance and you need permission to use them. In our scenario, all communication between applications is done via a data protocol: adding to and retrieving from shared storage.
  • LinkedIn is free to ignore the */accepted tags from any application if it chooses.
  • Tim can withdraw permission for a resolver to work on his behalf, simply by taking away that application’s permission to read tim/friend-request/linkedin tags.
  • Tim can stop LinkedIn from creating friend resolution tags by removing the user’s permission to add the tim/friend-request/linkedin tag to objects. That would be somewhat extreme, seeing as LinkedIn is likely to offer its users a simpler way for people to turn off the feature, but it’s worth pointing out that Tim has control.
  • To those who don’t have permission to read tim/friend-request/linkedin tags, there is no way to see who has made the friend request. (The fact that Tim is the target could also be easily obscured, if wanted.)
  • All communication is convention-based and asynchronous. This resembles the way we (and other organisms), often communicate in natural systems. I suspect most information communication between living organisms is asynchronous, though I have no way to quantify this. Asynchronous communication via conventions in shared storage (e.g., those seen in Twitter with hashtags and @addressing) is so powerful because it is open-ended and evolutionary. Fit conventions (in the biological sense) will flourish. Conventions can be extended by any player, without harm. I wrote more on this in Dancing out of time: Thoughts on asynchronous communication.

Note that the above is just an example of how applications can communicate indirectly and asynchronously through shared storage using evolving conventions instead of using direct, synchronous, predefined API calls between one another. We have seen a solution to a difficult address book problem that has not involved writing an address book application. Instead, the problem is solved by a set of lightweight and loosely coupled cooperating applications communicating through data. I have (very slowly!) come to realize that this form of inter-application communication is an important part of what Fluidinfo makes possible. This is all enabled by the simple move to shared writable storage, coupled with a flexible permissions model and a query language.

Thanks for reading. I really hope you’ll find this as interesting as I do. Thanks to Nicholas Radcliffe, Tim O’Reilly, and Bar Shirtcliff for comments that greatly improved the above.

Speaking at Gluecon 2011 in Denver

May 14th, 2011 by Terry Jones

Gluecon 2011I’ll be speaking at Gluecon in Denver on May 25/26. The conference looks fantastic and there are lots of people going that I’m looking forward to catching up with. My talk (Thu May 26, 9:30am) is titled Evolution of inter-application data protocols via shared writable storage, with the following rather wordy abstract:

Cloud storage offers a variety of potential advantages: greater capacity, ease of scaling, lower cost of ownership, fewer operations staff, less hardware build-out, reduced responsibility for backups, etc. These are all straightforward and in a sense linear changes. There is another advantage though that is more interesting: the inherent value created when applications use shared storage. Shared storage holds the potential for unanticipated valuable operations, including search and mash-ups, that go beyond the linear value of isolated per-application storage. Shared storage also allows asynchronous inter-application communication based on data protocols that emphasize emergent agreed conventions rather than a priori rules. This is a sharp departure from inter-application communication via pre-specified synchronous remote procedure calls. In this talk I will elaborate on this point of view, with examples from evolutionary systems and discussion of Fluidinfo. I’ll argue that shared writable storage is the real promise of cloud storage, and show how it offers an approach to a class of problems which includes Tim O’Reilly’s oft-asked question “Where is the Web 2.0 address book?”

For a simpler description of what I mean by all this, read the pair of articles on the O’Reilly Radar site: Dancing out of time: Thoughts on asynchronous communication and Getting closer to the Web 2.0 address book. If you’re going to Gluecon, please say hi! If you’re not going and you’d like to, you can register here and use the discount code spkr12 to get 15% off.

How we built the O’Reilly API using Fluidinfo

March 22nd, 2011 by Nicholas Tollervey

In case you haven’t noticed, we’ve imported the O’Reilly catalogue into Fluidinfo thus giving them an instantly writable API for their data.

How did we do it..?

There were three basic steps:

  1. Get the raw data.
  2. Clean the raw data.
  3. Import the cleaned data.

That’s it!

I’ll take each step in detail…

Get the raw data

Since we didn’t have an existing raw dump of the data nor access to O’Reilly’s database we had to think of some other way to get the catalogue. We found that O’Reilly had two different existing data services we could use: OPMI (O’Reilly Product Metadata Interface) and an affiliate’s API within Safari.

Unfortunately the RDF returned from OPMI is complicated. We’d either have to become experts in RDF or learn how to use a specialist library to get at the data we were interested in. We didn’t have time to pursue either of these avenues. The other alternative, the Safari service, just didn’t work as advertised. 🙁

Then we remembered learning about @frabcus and @amcguire62‘s ScraperWiki project.

Put simply, ScraperWiki allows you to write scripts that scrape (extract) information from websites and store the results for retrieval later. The “wiki” aspect of the ScraperWiki name comes from its collaborative development environment where users can share their scripts and the resulting raw data.

In any case, a couple of hours later I had the beginnings of a batched up script for scraping information from the O’Reilly catalogue on the website. After some tests and refactoring ScraperWiki started to do its stuff. The result was a data dump in the easy to understand and manipulate CSV or JSON formats. ScraperWiki saves the day!

Clean the raw data

This involved massaging the raw data into a meaningful structure that corresponded to the namespaces, tags and tag-values we were going to use in Fluidinfo. We also extracted some useful information from the raw data. For example, we made sure the publication date of each work was also stored in a machine-readable value. Finally, we checked that all the authors and books matched up.

Most of this work was done by a single Python script. It loaded the raw data (in JSON format), cleaned it and saved the cleaned data as another JSON file. This meant that we could re-clean the raw data any number of times when we got things wrong or needed to change anything. Since this was all done in-memory it was also very fast.

The file containing the cleaned data was simply a list of JSON objects that mapped to objects in Fluidinfo. The attributes of each JSON object corresponded to the tags and associated values to be imported.

Import the cleaned data

This stage took place in two parts:

  1. Create the required namespaces and tags
  2. Import the data by annotating objects

Given the cleaned data we were able to create the required namespaces and tags. You can see the resulting tree-like structure in the Fluidinfo explorer (on the left hand side).

Next, we simply iterated over the list of JSON objects and pushed them into Fluidinfo. (It’s important to note is that network latency means that importing data can seem to take a while. We’re well aware of this and will be blogging about best practices at a later date.)

That’s it!

We used Ali Afshar’s excellent FOM (Fluid Object Mapper) library for both creating the namespace and tags and importing the JSON objects into Fluidinfo and elements of flimp (the FLuid IMPorter) for pushing the JSON into FOM.

What have we learned..? The most time consuming part of the exercise was scraping the data. The next most time consuming aspect was agreeing how to organise it. The actual import of the data didn’t take long at all.

Given access to the raw data and a well thought out schema we could have done this in an afternoon.

Announcing a writable API for O’Reilly books and authors

March 21st, 2011 by Terry Jones

Today we’re excited to announce the release of a writable API for O’Reilly books and authors. There’s far too much news and information around this release to pack into a single blog post. Here’s a summary of what’s new today and where to find out more.

Here’s an extract from the press release:

General manager and publisher Joe Wikert is excited by the opportunities that a writable API provides to O’Reilly and other publishers. “It’s like LEGOs for publishing,” he says of the new malleability in his industry. “It’s as though we’ve been selling plastic children’s toys and the pieces were all glued together so customers could only use them the way we intended them to be used,” he adds. “Now we’ve decided to break the pieces into their component parts and let customers build whatever they want.”

Last but not least: if you want a modern, writable API for your data, drop us a line at info at fluidinfo com, and let’s talk.

The structure of O’Reilly book and author data in Fluidinfo

March 21st, 2011 by Nicholas Tollervey

This short post explains how the O’Reilly catalog is represented in Fluidinfo.

Put simply, we annotate two types of object: those representing products (usually books) and those representing authors. We annotate them using namespaces and tags within the top level namespace so you can be sure that this is bona fide O’Reilly information.

Within the namespace we store a bunch of “top level” tags that describe a product in the O’Reilly catalogue (title, summary, URL and so on). The namespace has two child namespaces: “authors” and “media“. (If you want a visual representation of this structure head on over to the Fluidinfo explorer and explore, starting from the tree menu on the left hand side.)

The authors namespace contains tags that define information about an author (name, biography, homepage and so on) and also contains a child namespace called “expertise“. The expertise namespace contains a set of tags that map to the list of areas of expertise that O’Reilly uses to categorise their authors. So, for example, an object representing the O’Reilly author “Chris DiBona” looks like this:

Notice how Chris’s object has tags under the namespace including several under the namespace. Importantly, the object also has tags that were not provided by the O’Reilly data. Terry has added a tag terrycojones/met to indicate (rather obviously) that he’s met Chris and the fluiddb/about tag is used to indicate that the object is about the author called Chris diBona.

What about the objects that represent books..? What do they look like..? Well let’s consider a current favourite of mine: “XMPP: The Definitive Guide”. Here’s how Nick Radcliffe’s excellent abouttag utility displays the object representing this book:

Whoa! Lots more tags! Many of them are from the domain (although notice how there are 15 missing). Once again it’s possible to see who/what else has been tagging the object. I’ve added a review and rating (ntoll/review and ntoll/rating) and various other people have annotated useful information that wasn’t at first in the dataset provided by O’Reilly.

How are authors and books linked..?

Every author object has an oreilly/authors/works tag that contains a list of the 13 digit O’Reilly ID / ISBN for each work they were involved in. Every book object has a corresponding and tag.

Alternatively, every book object has an tag that contains a list of it’s author’s homepages on the O’Reilly website and every author object has an associated containing the same information.

Finally, for the sake of completeness here’s a list of all the book and author tags along with a description of what each one represents:

Book tags

  • publication-day: The day of the month upon which the item was published.
  • publication-month – The number of the month within which the item was published.
  • duration – The duration of this item in minutes.
  • subtitle – The subtitle associated with the item.
  • id – The unique ID used by O’Reilly to identify the item, usually the 13-digit ISBN number (as a string).
  • page-count-is-estimate – A flag to indicate that any associated page count value is only an estimate.
  • cover-medium – The URL for a medium size image of the cover at the domain.
  • toc – The table of contents as text/html.
  • homepage – A URL to the item’s homepage on the O’Reilly website.
  • description – A long description of the item as text/html.
  • cover-small – The URL for a small size image of the cover at the domain.
  • author-urns – A list of unique reference numbers used by O’Reilly to reference the authors of the item.
  • cover-large – The URL for a large size image of the cover at the domain.
  • isbn – The 13-digit ISBN number (as a string).
  • safari-url – A URL to the item’s page on O’Reilly’s Safari service.
  • author-urls – A list of URLs pointing to the author’s homepages on the O’Reilly website.
  • pages – The number of pages this item has.
  • publisher – The name of the publisher of the item.
  • price-us – The advertised US price in cents.
  • title – The title of the item.
  • author-names – A list of author names.
  • summary – A short summary of the item as text/html.
  • publication-date – The publication date as YYYY-MM-DD.
  • price-uk – The advertised UK price in pence.
  • media – A list of the type[s] of media in which the item is available. Can be one or more of: ‘up-to-date’, ‘rough cut’, ‘dvd’, ‘ebook’, ‘kit’, ‘video’, ‘print’, ‘early release ebook’, ‘safari books online’ or ‘merchandise'”

Author tags

  • name – The author’s full name.
  • url – A URL to the author’s homepage on the O’Reilly website.
  • photo – A path to an image file containing a photo of the author hosted at the domain.
  • twitter – The author’s Twitter username.
  • works – A list of the ids of items that the author has created.
  • expertise – A list of the expertise tags associated with the author.
  • biography – The author’s biography as text/html.