Archive for the ‘Howto’ Category

See O’Reilly book, author & Radar content in context as you browse

Monday, April 30th, 2012

This video shows how the Fluidinfo Chrome extension displays relevant content, in context, while you browse:

We’ve added information to Fluidinfo for all the tags O’Reilly Radar have used on their articles since 2005. For example, nine articles were tagged with “patent reform”. Select those words anywhere you run into them on the web and a pop-up will show you links to the Radar posts. Content on almost 4000 topics that have been discussed on Radar is now just a click away. Because pop-ups are triggered when you select text, they are only displayed when and where relevant.

As you’ll see in the video, we’ve also added information about all O’Reilly books and authors.

To make things easier for first-timers, if you simply install the extension you’ll see all the O’Reilly content with no need to configure anything. You can log in and adjust things later if you like it and want to customize what you see. Give it a whirl—it’s fun and interesting.

Making Tumblr more social using Fluidinfo

Saturday, March 31st, 2012

There are about 50 million Tumblr blogs, so there’s a reasonable chance you or some of your friends are use Tumblr. It’s now possible to import Tumblr data into Fluidinfo with just a couple of clicks. If you follow someone on Fluidinfo who’s done that, and you’re a Chrome user, we have some great news for you. Have a look at this:

If you install the Fluidinfo Chrome extension and follow a few people, as shown in the video, you’ll soon be getting little pop-ups telling you when you stumble across things the people you follow have mentioned on Tumblr. Note that you don’t have to be a Tumblr user to do this, you can just follow some people who are.

Fluidinfo is now doing imports of all URLs, #hashtags, and @names that people mention on Twitter and on Tumblr and tying it all together. The chrome browser extension surfaces that information for you as you browse. We have a new Firefox extension that’s about to go into beta, and imports of Delicious and Diigo are in the pipeline, so stay tuned!

Browsing the web with the Fluidinfo Chrome extension

Wednesday, March 14th, 2012

Here’s a 4 minute video that shows what it’s like to browse the web with the Fluidinfo Chrome extension:

We’re really excited about where this is heading. The extension allows you to do several tricks not shown in the video, and we have more on the way. In a follow-up post we’ll take you through some of the features.

If you’re a programmer and want to get involved, or the extension gives you ideas for something you’d like to build, grab the source code from Github. We’d love to hear what you think. Comment below, email us at info@fluidinfo.com, or come hang out in the #fluidinfo channel on Freenode.

Import your tweets into Fluidinfo

Monday, February 6th, 2012

If you import your tweets into Fluidinfo, you can use our new web interface to see interesting information about all the @names, #hashtags and URLs you’ve mentioned on Twitter. To get going, just go to Fluidinfo and log in with Twitter (top right).

Fluidinfo will fetch your past tweets from Twitter and will examine them for all the @names, #hashtags, and URLs you’ve ever mentioned. That will probably take some minutes to complete (reload the page to check the import status).

For each of your tweets, we extract the @names, #hashtags, and URLs it mentions and we add to the Fluidinfo page for each of them. To make this more concrete, here are examples from people who have already imported their tweets into Fluidinfo:

Example: the #occupy hashtag Here’s the Fluidinfo page for #occupy. Click the links on the left of that page to explore different views of #occupy information in Fluidinfo and across the web. For example, you can see mentions of #occupy by Paul Kedrosky, mentions of #occupy by Tim O’Reilly, and mentions of #occupy by Ethan Zuckerman. On the right is a screenshot that shows some of how #occupy appears to me when I look at my friends’ mentions of it (click to see a larger version). Along with the page for #occupy, Fluidinfo has a page for every hashtag.


Example: a URL In December 2011, Fred Wilson blogged about Freedom To Innovate. Fluidinfo has a page for the URL of Fred’s post. Click the links on the left of that page to explore. Later, Tim O’Reilly tweeted about the post. Tim has imported his tweets into Fluidinfo, so you can see his "mentioned" tag on the URL of Fred’s post. Brad Feld, who has also imported his tweets, mentioned Fred’s post as well. On the right is a screenshot that shows some of how Fred’s post appears to me when I look at my friends’ mentions of it (click to see a larger version). Along with the page for Fred’s article, Fluidinfo has a page for every URL.

Example: @sarawinge Esther Dyson mentioned (via retweeting) Sara Winge, so on the @sarawinge page in Fluidinfo you should expect to see Esther’s mentions of Sara. If you explore the views on the left of that page, you’ll also see mentions of Sara by @marcprecipice, @pkedrosky, @timoreilly, and others. Along with the page for @sarawinge, Fluidinfo has a page for every @username.

Example: scientific american Just as it does for @names, Fluidinfo tags the user’s name (as given on Twitter). So because Joi Ito has mentioned @sciam (the Twitter account for Scientific American), the "scientific american" page in Fluidinfo has Joi’s mentions of @sciam. The views on the left of that page show mentions of Scientific American by others, as well as lots of other information from across the web. And yes, you guessed it, along with the page for "scientific american", Fluidinfo has a page for every name.

Add to these pages yourself! If you import your tweets, your mentions of @names, hashtags and URLs will also be added to pages in Fluidinfo. But don’t forget that you can also add your own tags to any page at all. After you log in, click the green Tag button to add something. Enter a tag name (e.g., comment) and a value (the text of your comment) and click Submit.

Next up… search In a follow-on post, I’ll show you how to use the secret terminal built into the Fluidinfo. The terminal lets you search your tweets, find things different people have mentioned in common, and much more besides.

How we built the O’Reilly API using Fluidinfo

Tuesday, March 22nd, 2011

In case you haven’t noticed, we’ve imported the O’Reilly catalogue into Fluidinfo thus giving them an instantly writable API for their data.

How did we do it..?

There were three basic steps:

  1. Get the raw data.
  2. Clean the raw data.
  3. Import the cleaned data.

That’s it!

I’ll take each step in detail…

Get the raw data

Since we didn’t have an existing raw dump of the data nor access to O’Reilly’s database we had to think of some other way to get the catalogue. We found that O’Reilly had two different existing data services we could use: OPMI (O’Reilly Product Metadata Interface) and an affiliate’s API within Safari.

Unfortunately the RDF returned from OPMI is complicated. We’d either have to become experts in RDF or learn how to use a specialist library to get at the data we were interested in. We didn’t have time to pursue either of these avenues. The other alternative, the Safari service, just didn’t work as advertised. :-(

Then we remembered learning about @frabcus and @amcguire62‘s ScraperWiki project.

Put simply, ScraperWiki allows you to write scripts that scrape (extract) information from websites and store the results for retrieval later. The “wiki” aspect of the ScraperWiki name comes from its collaborative development environment where users can share their scripts and the resulting raw data.

In any case, a couple of hours later I had the beginnings of a batched up script for scraping information from the O’Reilly catalogue on the oreilly.com website. After some tests and refactoring ScraperWiki started to do its stuff. The result was a data dump in the easy to understand and manipulate CSV or JSON formats. ScraperWiki saves the day!

Clean the raw data

This involved massaging the raw data into a meaningful structure that corresponded to the namespaces, tags and tag-values we were going to use in Fluidinfo. We also extracted some useful information from the raw data. For example, we made sure the publication date of each work was also stored in a machine-readable value. Finally, we checked that all the authors and books matched up.

Most of this work was done by a single Python script. It loaded the raw data (in JSON format), cleaned it and saved the cleaned data as another JSON file. This meant that we could re-clean the raw data any number of times when we got things wrong or needed to change anything. Since this was all done in-memory it was also very fast.

The file containing the cleaned data was simply a list of JSON objects that mapped to objects in Fluidinfo. The attributes of each JSON object corresponded to the tags and associated values to be imported.

Import the cleaned data

This stage took place in two parts:

  1. Create the required namespaces and tags
  2. Import the data by annotating objects

Given the cleaned data we were able to create the required namespaces and tags. You can see the resulting tree-like structure in the Fluidinfo explorer (on the left hand side).

Next, we simply iterated over the list of JSON objects and pushed them into Fluidinfo. (It’s important to note is that network latency means that importing data can seem to take a while. We’re well aware of this and will be blogging about best practices at a later date.)

That’s it!

We used Ali Afshar’s excellent FOM (Fluid Object Mapper) library for both creating the namespace and tags and importing the JSON objects into Fluidinfo and elements of flimp (the FLuid IMPorter) for pushing the JSON into FOM.

What have we learned..? The most time consuming part of the exercise was scraping the data. The next most time consuming aspect was agreeing how to organise it. The actual import of the data didn’t take long at all.

Given access to the raw data and a well thought out schema we could have done this in an afternoon.

The structure of O’Reilly book and author data in Fluidinfo

Monday, March 21st, 2011

This short post explains how the O’Reilly catalog is represented in Fluidinfo.

Put simply, we annotate two types of object: those representing products (usually books) and those representing authors. We annotate them using namespaces and tags within the oreilly.com top level namespace so you can be sure that this is bona fide O’Reilly information.

Within the oreilly.com namespace we store a bunch of “top level” tags that describe a product in the O’Reilly catalogue (title, summary, URL and so on). The oreilly.com namespace has two child namespaces: “authors” and “media“. (If you want a visual representation of this structure head on over to the Fluidinfo explorer and explore, starting from the tree menu on the left hand side.)

The authors namespace contains tags that define information about an author (name, biography, homepage and so on) and also contains a child namespace called “expertise“. The expertise namespace contains a set of tags that map to the list of areas of expertise that O’Reilly uses to categorise their authors. So, for example, an object representing the O’Reilly author “Chris DiBona” looks like this:

Notice how Chris’s object has tags under the oreilly.com/authors namespace including several under the oreilly.com/authors/expertise namespace. Importantly, the object also has tags that were not provided by the O’Reilly data. Terry has added a tag terrycojones/met to indicate (rather obviously) that he’s met Chris and the fluiddb/about tag is used to indicate that the object is about the author called Chris diBona.

What about the objects that represent books..? What do they look like..? Well let’s consider a current favourite of mine: “XMPP: The Definitive Guide”. Here’s how Nick Radcliffe’s excellent abouttag utility displays the object representing this book:

Whoa! Lots more tags! Many of them are from the oreilly.com domain (although notice how there are 15 missing). Once again it’s possible to see who/what else has been tagging the object. I’ve added a review and rating (ntoll/review and ntoll/rating) and various other people have annotated useful information that wasn’t at first in the dataset provided by O’Reilly.

How are authors and books linked..?

Every author object has an oreilly/authors/works tag that contains a list of the 13 digit O’Reilly ID / ISBN for each work they were involved in. Every book object has a corresponding oreilly.com/id and oreilly.com/isbn tag.

Alternatively, every book object has an oreilly.com/authors-urls tag that contains a list of it’s author’s homepages on the O’Reilly website and every author object has an associated oreilly.com/url containing the same information.

Finally, for the sake of completeness here’s a list of all the book and author tags along with a description of what each one represents:

Book tags

  • publication-day: The day of the month upon which the item was published.
  • publication-month – The number of the month within which the item was published.
  • duration – The duration of this item in minutes.
  • subtitle – The subtitle associated with the item.
  • id – The unique ID used by O’Reilly to identify the item, usually the 13-digit ISBN number (as a string).
  • page-count-is-estimate – A flag to indicate that any associated page count value is only an estimate.
  • cover-medium – The URL for a medium size image of the cover at the oreilly.com domain.
  • toc – The table of contents as text/html.
  • homepage – A URL to the item’s homepage on the O’Reilly website.
  • description – A long description of the item as text/html.
  • cover-small – The URL for a small size image of the cover at the oreilly.com domain.
  • author-urns – A list of unique reference numbers used by O’Reilly to reference the authors of the item.
  • cover-large – The URL for a large size image of the cover at the oreilly.com domain.
  • isbn – The 13-digit ISBN number (as a string).
  • safari-url – A URL to the item’s page on O’Reilly’s Safari service.
  • author-urls – A list of URLs pointing to the author’s homepages on the O’Reilly website.
  • pages – The number of pages this item has.
  • publisher – The name of the publisher of the item.
  • price-us – The advertised US price in cents.
  • title – The title of the item.
  • author-names – A list of author names.
  • summary – A short summary of the item as text/html.
  • publication-date – The publication date as YYYY-MM-DD.
  • price-uk – The advertised UK price in pence.
  • media – A list of the type[s] of media in which the item is available. Can be one or more of: ‘up-to-date’, ‘rough cut’, ‘dvd’, ‘ebook’, ‘kit’, ‘video’, ‘print’, ‘early release ebook’, ‘safari books online’ or ‘merchandise’”

Author tags

  • name – The author’s full name.
  • url – A URL to the author’s homepage on the O’Reilly website.
  • photo – A path to an image file containing a photo of the author hosted at the oreilly.com domain.
  • twitter – The author’s Twitter username.
  • works – A list of the ids of items that the author has created.
  • expertise – A list of the expertise tags associated with the author.
  • biography – The author’s biography as text/html.

Examples of Fluidinfo O’Reilly API queries

Monday, March 21st, 2011

This post is all about querying the O’Reilly book and author information recently imported into Fluidinfo. If you want the skinny on Fluidinfo’s query language in glorious in-depth techno-geek-speak then check out the documentation. If you’d rather see some real world examples, read on…

In Fluidinfo, objects represent things (and all objects have a unique id). Information is added to objects using tags. Tags can have values, and tag names are organized into namespaces that give them context. Permissions control who can see and use namespaces and tags.

Objects do not belong to anyone and don’t have permissions associated with them. They’re openly writable. Anyone can tag anything to any object. Many objects have a special globally unique “about” tag value that indicates what they are about. Interaction with Fluidinfo is via a REST API.

That’s Fluidinfo in a nutshell.

In another article published today I describe the Fluidinfo tags and namespaces used to annotate objects with O’Reilly data. The tags are attached to objects for O’Reilly books and authors. Both kinds of objects have about tags. So a trivial first kind of query is to go directly to an object that’s about a book. For example, to get information about the object representing the book “Open Government” visit the URL http://fluiddb.fluidinfo.com/about/book:open government (daniel lathrop; laurel ruma).

You’ll get back a JSON response containing a list of all the tags (that you have permission to read) attached to that object and the object’s globally unique id. Similarly, you can go directly to the object for an O’Reilly author http://fluiddb.fluidinfo.com/about/author:tim oreilly.

In case you’re wondering about the format of these book and author about tags, we used the abouttag library written by Nicholas Radcliffe to generate them. They’re designed to be readable, easy to generate programmatically, and unlikely to result in collisions. You don’t have to remember them though, as there are many other ways to get at objects, via querying, as we’re about to see.

Queries on tags and their values

Below are some examples of using Fluidinfo’s query language.

Presence

Return all the objects that have an O’Reilly title:

has oreilly.com/title

You can see the results at the following URL: http://fluiddb.fluidinfo.com/objects?query=has oreilly.com/title. Once again, the result is in JSON. It simply contains a list of the ids of matching objects (representing things that O’Reilly have tagged with a title).

That’s the equivalent of the following SQL statement:

SELECT id FROM oreilly.com WHERE title IS NOT NULL;

Caveat: There are no tables in Fluidinfo so it’s impossible to make a direct translation to SQL. This example and those that follow simply illustrate a conceptual equivalence to make it easier for those of you familiar with SQL to get your heads around the Fluidinfo query language.

Comparison

Return all the O’Reilly objects whose price is less than $40 (the price is stored in cents).

oreilly.com/price-us < 4000

Here it is as a URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/price-us < 4000

In SQL it would be:

SELECT id FROM oreilly.com WHERE price-us < 4000;

Text Matching

Return all the O’Reilly objects that have “Python” in the title.

oreilly.com/title matches "Python"

The resulting URL: https://fluiddb.fluidinfo.com/objects?query=oreilly.com/title matches “Python”

In SQL:

SELECT id FROM oreilly.com WHERE title LIKE '%Python%';

Set Contents

Return all the O’Reilly objects representing authors who were involved in writing the work with ISBN “9781565923607″ (which is the unique ID O’Reilly use in their catalog). The value of oreilly.com/authors/works tags is always a set of unique ISBN numbers like this: ["9781565923607", "9781565563728", "9781627397284"].

oreilly.com/authors/works contains "9781565923607"

The resulting URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/authors/works contains “9781565923607″

In SQL:

SELECT id FROM oreilly.com/authors WHERE '9781565923607' IN (SELECT works FROM oreilly.com/authors);

(Actually, the similar “IN” operation in SQL isn’t a very good example since it results in verbose monstrosities like the above.)

Exclusion

Return all the O’Reilly books that were published in 2001 except those published in April.

oreilly.com/publication-year=2010 except oreilly.com/publication-month=4

The resulting URL: https://fluiddb.fluidinfo.com/objects?query=oreilly.com/publication-year=2010 except oreilly.com/publication-month=4

In SQL:

SELECT id FROM oreilly.com WHERE year=2010 AND month<>4;

Logic

It’s possible to use the and and or logical operations. For example, return all the O’Reilly books whose title matches “Python” and were published before 2005:

oreilly.com/title matches "Python" and oreilly.com/publication-year < 2005

The resulting URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/title matches “Python” and oreilly.com/publication-year < 2005

In SQL:

SELECT id FROM oreilly.com WHERE title LIKE '%Python%' AND year < 2005

Grouping

Return all the objects representing O’Reilly books mentioning “Python” in their title that were published in either 2008 or 2010.

oreilly.com/title matches "Python" and (oreilly.com/publication-year=2008 or oreilly.com/publication-year=2010)

The resulting URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/title matches “Python” and (oreilly.com/publication-year=2008 or oreilly.com/publication-year=2010)

In SQL:

SELECT id FROM oreilly.com WHERE title LIKE '%Python%' AND (year = 2008 OR year = 2010);

Querying across different data sets

Fluidinfo can query seamlessly across tags from different sources that are stored on the same object. E.g., return the titles of all O’Reilly books that Terry Jones owns.

has oreilly.com/title and has terrycojones/owns

The resulting URL: http://fluiddb.fluidinfo.com/objects?query=has oreilly.com/title and has terrycojones/owns

In SQL:

Well, it’s actually not clear how you’d do this in SQL. Presumably there’d need to be some kind of table join, supposing that were possible!

Getting back tags on objects matching a query

It’s also possible to indicate which tag values to return for each matching object. This is done by using the Fluidinfo /values HTTP endpoint and specifying the tag values to return as arguments in the URL path. For example, if I wanted the title, author names and publication year of all the O’Reilly books with the word “Python” in the title published before 2006 then I’d use the following query:

oreilly.com/title matches "Python" and oreilly.com/publication-year < 2006

and append the wanted tags to the URL after the query (in any order):

&tag=oreilly.com/title&tag=oreilly.com/author-names&tag=oreilly.com/publication-year

The resulting URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/title matches “Python” and oreilly.com/publication-year < 2006&tag=oreilly.com/title&tag=oreilly.com/author-names&tag=oreilly.com/publication-year

This is similar to the following SQL:

SELECT title, authors, year FROM oreilly.com WHERE title LIKE '%Python%' AND year < 2006;

Fluidinfo returns a JSON object like this:

{u'results': {u'id': {u'1a91e021-7bce-4693-bfa5-0dc437fe1817':
    {u'oreilly.com/author-names': {u'value': [u'Anna Ravenscroft', u'David Ascher', u'Alex Martelli']},
     u'oreilly.com/publication-year': {u'value': 2005},
     u'oreilly.com/title': {u'value': u'Python Cookbook, Second Edition'}},
u'1d25baae-b977-4ff4-bb77-01c52bd1d339':
    {u'oreilly.com/author-names': {u'value': [u'Fredrik Lundh']},
     u'oreilly.com/publication-year': {u'value': 2001},
     u'oreilly.com/title': {u'value': u'Python Standard Library'}},
u'3360f05f-9bf4-4da5-abc0-0e3742809b98':
    {u'oreilly.com/author-names': {u'value': [u'Fred L. Drake Jr', u'Christopher A. Jones']},
     u'oreilly.com/publication-year': {u'value': 2001},
     u'oreilly.com/title': {u'value': u'Python &amp; XML'}},
u'9845b184-ef1b-46fb-8e7c-011da053dcb6':
    {u'oreilly.com/author-names': {u'value': [u'Andy Robinson', u'Mark Hammond']},
     u'oreilly.com/publication-year': {u'value': 2000},
     u'oreilly.com/title': {u'value': u'Python Programming On Win32'}}}}}

It’s also possible to update and delete tag values from matching objects. This process is explained in detail in the Fluidinfo documentation and this blog post.

Finally, rather than interacting with Fluidinfo directly using the raw HTTP API it’s a good idea to use one of the client libraries listed here. For example, using the fluidinfo.py library the last example query can be executed as follows:

>>> import fluidinfo
>>> import pprint
>>> headers, result = fluidinfo.call('GET', '/values', tags=['oreilly.com/title', 'oreilly.com/author-names', 'oreilly.com/publication-year'], query='oreilly.com/title matches "Python" and oreilly.com/publication-year < 2006')
>>> pprint.pprint(headers)
{'cache-control': 'no-cache',
 'connection': 'keep-alive',
 'content-length': '937',
 'content-location': 'https://fluiddb.fluidinfo.com/values?query=oreilly.com%2Ftitle+matches+%22Python%22+and+oreilly.com%2Fpublication-year+%3C+2006&tag=oreilly.com%2Ftitle&tag=oreilly.com%2Fauthor-names&tag=oreilly.com%2Fpublication-year',
 'content-type': 'application/json',
 'date': 'Thu, 10 Mar 2011 15:17:58 GMT',
 'server': 'nginx/0.7.65',
 'status': '200'}
>>> pprint.pprint(result)
{u'results': {u'id': {u'1a91e021-7bce-4693-bfa5-0dc437fe1817': {u'oreilly.com/author-names': {u'value': [u'Anna Ravenscroft',
... etc ...
 

Learn more

Hopefully, this has explained enough to get you started. If you don’t have a Fluidinfo account, you can sign up here. If you have any questions, please don’t hesitate to get involved with the Fluidinfo community, contact us directly or join us on IRC. We’ll be more than happy to help!

O’Reilly Fluidinfo Chrome extension

Monday, March 21st, 2011

To help people get going with the API competition announced today on the O’Reilly Radar site, Emanuel Carnevale has written a cool extension for Google’s Chrome browser. The extension shows some of the non-O’Reilly tags on the book objects and also lets you indicate which O’Reilly books you own. It does this by putting tags onto the objects representing O’Reilly books in Fluidinfo.

To install the extension onto your Chrome browser click on the following link (from within Chrome): https://fluiddb.fluidinfo.com/about/oreilly.com/fluidinfo/chrome-extension.crx. Your browser will guide you through what to do. It’s pretty obvious stuff. Once it’s installed you’ll see a new icon in the top right hand corner of the browser window between the address bar and the little spanner icon:

Click the icon and sign in with your Fluidinfo credentials. If you don’t yet have an account on Fluidinfo you can sign up here.

How do you use it..?

Simple. Go visit the O’Reilly catalog and click on one of the books you own. For example, I happen to be the proud owner of Natural Language Processing with Python. If you visit the page for the book you’ll notice a new small Fluidinfo icon in the book details:


Click the icon and you’ll see a pop-up like this:

You can click on the appropriate statement at bottom to indicate ownership or not, as the case may be.

The writable API gives us all a voice

The extension uses an “owns” tag in your top-level Fluidinfo namespace to indicate book ownership on the objects in Fluidinfo. For example, my tag is called “ntoll/owns”. The extension attaches this tag to the object representing the O’Reilly book whose page you are visiting.

Because the extension tags the exact same Fluidinfo objects that have the O’Reilly information, I can start to do some really cool searches. For example, I happen to know Terry has a particularly large O’Reilly “zoo” as do I (in fact, doesn’t every developer..?). We can see what books we both own about Python with the following query:

oreilly.com/title matches "Python" and has terrycojones/owns and has ntoll/owns

The following code snippet for running this query uses the fluidinfo.py client library from within the Python shell. Alternatively, you can see the result directly if you visit this URL.

>>> import fluidinfo
>>> import pprint
>>> headers, result = fluidinfo.call('GET', '/values', tags=['oreilly.com/title',], query='oreilly.com/title matches "Python" and has terrycojones/owns and has ntoll/owns')
>>> pprint.pprint(result)
{u'results': {u'id': {
                      u'01371c03-9097-4267-a137-ae88a23790ef': {u'oreilly.com/title': {u'value': u'Python Pocket Reference, Fourth Edition'}},
                      u'4e9c42b6-68cb-43f5-9b75-60af9c0bd5a7': {u'oreilly.com/title': {u'value': u'Programming Python, Fourth Edition'}},
                      u'cd0838db-96ae-42ae-98c9-248a1507e2bb': {u'oreilly.com/title': {u'value': u'Python in a Nutshell, Second Edition'}}}}}

This illustrates how anyone can add tags to the objects being used by O’Reilly, and can then search based on their own additions and those of others. That’s why we say that Fluidinfo provides writable APIs. Cool :-)

Run with it!

There’s obviously a lot more that could be done with this extension. We kept it simple mainly because we wanted to give an example of how such an extension could be written. We hope it can provide a basis for your own efforts, especially if you’re entering the O’Reilly API competition. Emanuel has released the source code for the extension so you can grab it from Github and take it from there!

Indicating (shared) interest in things without disclosing what they are

Saturday, March 5th, 2011

Imagine you want wanted to tell the world you were interested in something, for example an email address or a phone number, without telling the world what that thing was. That may not sound so interesting, but if several people were doing the same thing, it would be a mechanism for discovery of private things you had in common, without telling anyone else what those things were.

Russell Manley and I just thought of a simple way to do this using Fluidinfo. Here’s how we did it for the email addresses we know.

For each email address, compute its MD5 sum. Then, put a rustlem/knows or terrycojones/knows tag onto the object whose fluiddb/about value is the MD5 sum. The MD5 algorithm is essentially one-way, so even if someone finds a Fluidinfo object with either of our tags on it (which is trivial) they cannot recover the original email address.

This is pretty nice. We’re independently indicating things of interest, but neither of us is publicly saying what those things are. Because we’re putting our information onto the same objects in Fluidinfo, we can then easily discover things we have in common with each other (and with others), without the world knowing what. We can do the same thing for phone numbers, or anything else.

Getting the data into Fluidinfo was trivial. Here’s code I used to put a terrycojones/knows tag (with value True) onto the appropriate objects:

import sys, hashlib
from fom.session import Fluid

fdb = Fluid()
fdb.login('terrycojones', 'PASSWORD')

for thing in sys.stdin.readlines():
    about = hashlib.md5(thing[:-1]).hexdigest()
    fdb.about[about]['terrycojones/knows'].put(True)
 

You pass a list of email addresses to this script on standard input.

Russell and I each had about a thousand email addresses in our address books. A first question is how many addresses we know in common. You can get the answer to this with the simple Fluidinfo query has terrycojones/knows and has rustlem/knows. It turns out there are 53 common addresses. But the results don’t tell us which addresses those are, which is also interesting.

We also wrote a small script to print any tags ending in /knows for a set of email addresses given on the command line.

import sys, hashlib
from fom.session import Fluid
from fom.errors import Fluid404Error
fdb = Fluid()

for thing in sys.argv[1:]:
    about = hashlib.md5(thing).hexdigest()
    print thing, about
    try:
        for tag in fdb.about[about].get().value['tagPaths']:
            if tag.endswith('/knows'):
                print '\t', tag
    except Fluid404Error:
        print '\tunknown'
 

So given an email address, we can run the above and see who else knows (or claims to) that email address.

We find all this quite thought provoking. Without going into details of the social side of this, it’s worth pointing out that Fluidinfo makes this kind of information sharing very easy because it has a guaranteed writable object for everything, including all MD5 sums. Because the fluiddb/about tag is unique and isn’t owned by anyone, any user can add their knows tag to the object for any MD5 sum. The ability for users and applications to work independently and yet to share information by just following a fluiddb/about convention is one of the coolest things about Fluidinfo.

Finally, note that this system does not guarantee privacy. If someone already knows an email address or phone number (etc) they can compute its MD5 sum and examine the Fluidinfo tags on the corresponding object. Doing so they might see a rustlem/knows tag and would then be free to draw their own conclusion.

You can play too. All you need is a Fluidinfo account and the above code. Please let us know how you get on. For example, you can freely tweet any MD5 sums we have in common. We’re going to use the hashtag #incommon, like this.

How to make an API in Fluidinfo

Wednesday, February 23rd, 2011

It’s very simple really:

1. Register a domain/user on Fluidinfo

Start here. If you’re registering a domain name then we will require proof of ownership (the instructions explaining how to do this are very simple).

2. Create your namespaces and tags

Be careful, you’re choosing how your data will be structured in Fluidinfo. Some tips we’ve found useful:

  • Flat is good.
  • Use namespaces to differentiate between the different sorts of things you’ll be tagging (e.g. between books and authors).
  • Copy conventions (how do others organise their data?).
  • KISS! Keep it simple (stupid!).

3. Import your data into Fluidinfo

To help you, we have a Python based script/library called FLIMP. Nevertheless, there are lots of freely available libraries that you may want to adapt yourself.

4. Announce your new API

Programmers will interact with your data via the general Fluidinfo API, which is simple and well documented. All you need to do is tell the world that your data is available, and what namespaces and tags you’re using to store it in Fluidinfo.

That’s it.

Please feel free to get in touch at any time if you have any questions or would like to explore the possibility of Fluidinfo Inc. helping you to add your data to Fluidinfo.