Archive for the ‘Progress’ Category

See O’Reilly book, author & Radar content in context as you browse

Monday, April 30th, 2012

This video shows how the Fluidinfo Chrome extension displays relevant content, in context, while you browse:

We’ve added information to Fluidinfo for all the tags O’Reilly Radar have used on their articles since 2005. For example, nine articles were tagged with “patent reform”. Select those words anywhere you run into them on the web and a pop-up will show you links to the Radar posts. Content on almost 4000 topics that have been discussed on Radar is now just a click away. Because pop-ups are triggered when you select text, they are only displayed when and where relevant.

As you’ll see in the video, we’ve also added information about all O’Reilly books and authors.

To make things easier for first-timers, if you simply install the extension you’ll see all the O’Reilly content with no need to configure anything. You can log in and adjust things later if you like it and want to customize what you see. Give it a whirl—it’s fun and interesting.

How we built the O’Reilly API using Fluidinfo

Tuesday, March 22nd, 2011

In case you haven’t noticed, we’ve imported the O’Reilly catalogue into Fluidinfo thus giving them an instantly writable API for their data.

How did we do it..?

There were three basic steps:

  1. Get the raw data.
  2. Clean the raw data.
  3. Import the cleaned data.

That’s it!

I’ll take each step in detail…

Get the raw data

Since we didn’t have an existing raw dump of the data nor access to O’Reilly’s database we had to think of some other way to get the catalogue. We found that O’Reilly had two different existing data services we could use: OPMI (O’Reilly Product Metadata Interface) and an affiliate’s API within Safari.

Unfortunately the RDF returned from OPMI is complicated. We’d either have to become experts in RDF or learn how to use a specialist library to get at the data we were interested in. We didn’t have time to pursue either of these avenues. The other alternative, the Safari service, just didn’t work as advertised. :-(

Then we remembered learning about @frabcus and @amcguire62‘s ScraperWiki project.

Put simply, ScraperWiki allows you to write scripts that scrape (extract) information from websites and store the results for retrieval later. The “wiki” aspect of the ScraperWiki name comes from its collaborative development environment where users can share their scripts and the resulting raw data.

In any case, a couple of hours later I had the beginnings of a batched up script for scraping information from the O’Reilly catalogue on the oreilly.com website. After some tests and refactoring ScraperWiki started to do its stuff. The result was a data dump in the easy to understand and manipulate CSV or JSON formats. ScraperWiki saves the day!

Clean the raw data

This involved massaging the raw data into a meaningful structure that corresponded to the namespaces, tags and tag-values we were going to use in Fluidinfo. We also extracted some useful information from the raw data. For example, we made sure the publication date of each work was also stored in a machine-readable value. Finally, we checked that all the authors and books matched up.

Most of this work was done by a single Python script. It loaded the raw data (in JSON format), cleaned it and saved the cleaned data as another JSON file. This meant that we could re-clean the raw data any number of times when we got things wrong or needed to change anything. Since this was all done in-memory it was also very fast.

The file containing the cleaned data was simply a list of JSON objects that mapped to objects in Fluidinfo. The attributes of each JSON object corresponded to the tags and associated values to be imported.

Import the cleaned data

This stage took place in two parts:

  1. Create the required namespaces and tags
  2. Import the data by annotating objects

Given the cleaned data we were able to create the required namespaces and tags. You can see the resulting tree-like structure in the Fluidinfo explorer (on the left hand side).

Next, we simply iterated over the list of JSON objects and pushed them into Fluidinfo. (It’s important to note is that network latency means that importing data can seem to take a while. We’re well aware of this and will be blogging about best practices at a later date.)

That’s it!

We used Ali Afshar’s excellent FOM (Fluid Object Mapper) library for both creating the namespace and tags and importing the JSON objects into Fluidinfo and elements of flimp (the FLuid IMPorter) for pushing the JSON into FOM.

What have we learned..? The most time consuming part of the exercise was scraping the data. The next most time consuming aspect was agreeing how to organise it. The actual import of the data didn’t take long at all.

Given access to the raw data and a well thought out schema we could have done this in an afternoon.

The structure of O’Reilly book and author data in Fluidinfo

Monday, March 21st, 2011

This short post explains how the O’Reilly catalog is represented in Fluidinfo.

Put simply, we annotate two types of object: those representing products (usually books) and those representing authors. We annotate them using namespaces and tags within the oreilly.com top level namespace so you can be sure that this is bona fide O’Reilly information.

Within the oreilly.com namespace we store a bunch of “top level” tags that describe a product in the O’Reilly catalogue (title, summary, URL and so on). The oreilly.com namespace has two child namespaces: “authors” and “media“. (If you want a visual representation of this structure head on over to the Fluidinfo explorer and explore, starting from the tree menu on the left hand side.)

The authors namespace contains tags that define information about an author (name, biography, homepage and so on) and also contains a child namespace called “expertise“. The expertise namespace contains a set of tags that map to the list of areas of expertise that O’Reilly uses to categorise their authors. So, for example, an object representing the O’Reilly author “Chris DiBona” looks like this:

Notice how Chris’s object has tags under the oreilly.com/authors namespace including several under the oreilly.com/authors/expertise namespace. Importantly, the object also has tags that were not provided by the O’Reilly data. Terry has added a tag terrycojones/met to indicate (rather obviously) that he’s met Chris and the fluiddb/about tag is used to indicate that the object is about the author called Chris diBona.

What about the objects that represent books..? What do they look like..? Well let’s consider a current favourite of mine: “XMPP: The Definitive Guide”. Here’s how Nick Radcliffe’s excellent abouttag utility displays the object representing this book:

Whoa! Lots more tags! Many of them are from the oreilly.com domain (although notice how there are 15 missing). Once again it’s possible to see who/what else has been tagging the object. I’ve added a review and rating (ntoll/review and ntoll/rating) and various other people have annotated useful information that wasn’t at first in the dataset provided by O’Reilly.

How are authors and books linked..?

Every author object has an oreilly/authors/works tag that contains a list of the 13 digit O’Reilly ID / ISBN for each work they were involved in. Every book object has a corresponding oreilly.com/id and oreilly.com/isbn tag.

Alternatively, every book object has an oreilly.com/authors-urls tag that contains a list of it’s author’s homepages on the O’Reilly website and every author object has an associated oreilly.com/url containing the same information.

Finally, for the sake of completeness here’s a list of all the book and author tags along with a description of what each one represents:

Book tags

  • publication-day: The day of the month upon which the item was published.
  • publication-month – The number of the month within which the item was published.
  • duration – The duration of this item in minutes.
  • subtitle – The subtitle associated with the item.
  • id – The unique ID used by O’Reilly to identify the item, usually the 13-digit ISBN number (as a string).
  • page-count-is-estimate – A flag to indicate that any associated page count value is only an estimate.
  • cover-medium – The URL for a medium size image of the cover at the oreilly.com domain.
  • toc – The table of contents as text/html.
  • homepage – A URL to the item’s homepage on the O’Reilly website.
  • description – A long description of the item as text/html.
  • cover-small – The URL for a small size image of the cover at the oreilly.com domain.
  • author-urns – A list of unique reference numbers used by O’Reilly to reference the authors of the item.
  • cover-large – The URL for a large size image of the cover at the oreilly.com domain.
  • isbn – The 13-digit ISBN number (as a string).
  • safari-url – A URL to the item’s page on O’Reilly’s Safari service.
  • author-urls – A list of URLs pointing to the author’s homepages on the O’Reilly website.
  • pages – The number of pages this item has.
  • publisher – The name of the publisher of the item.
  • price-us – The advertised US price in cents.
  • title – The title of the item.
  • author-names – A list of author names.
  • summary – A short summary of the item as text/html.
  • publication-date – The publication date as YYYY-MM-DD.
  • price-uk – The advertised UK price in pence.
  • media – A list of the type[s] of media in which the item is available. Can be one or more of: ‘up-to-date’, ‘rough cut’, ‘dvd’, ‘ebook’, ‘kit’, ‘video’, ‘print’, ‘early release ebook’, ‘safari books online’ or ‘merchandise'”

Author tags

  • name – The author’s full name.
  • url – A URL to the author’s homepage on the O’Reilly website.
  • photo – A path to an image file containing a photo of the author hosted at the oreilly.com domain.
  • twitter – The author’s Twitter username.
  • works – A list of the ids of items that the author has created.
  • expertise – A list of the expertise tags associated with the author.
  • biography – The author’s biography as text/html.

Marc Hedlund joins the Fluidinfo board

Wednesday, February 9th, 2011

We’re really happy to announce that Marc Hedlund has joined the Fluidinfo board!

I’ve gotten to know Marc slowly over the last 10 years. We first met very briefly when he was CEO of the Popular Power, a San Francisco start-up. Nelson Minar (Marc’s co-founder) and Derek Smith, two of my close friends who are very close to Marc, were both working there. Nelson and Derek, as well as several others including Fluidinfo investor and advisor Tim O’Reilly have sky-high opinions of Marc. Hearing regular off-the-charts superlatives about Marc over the years always kept me interested to someday know him better.

Marc was present at my first ever (abysmal!) solo VC pitch for Fluidinfo, to the ill-fated Bryce Roberts and Mark Jacobson of OATV in early 2007. During the presentation, Marc interrupted to ask if he could take a photo of my slide titled “Revenue”. I think he wanted it as an example of how not to pitch a VC. I’ve never forgotten. He snapped the pic, resumed his seat, and told me to carry on :-)

Marc has a ton of experience. He founded and led Lucas Online, the internet subsidiary of Lucasfilm, was director of engineering at Organic Online, and was also CTO at Webstorm. After Popular Power he was VP of Engineering at Sana Security, and then Entrepreneur in Residence at OATV, gaining intimate knowledge of the world of venture capital and interacting with hundreds of start-up companies. Marc then co-founded Wesabe where he was Chief Product Officer before becoming CEO. These days he’s Chief Product Office at Daylife in New York.

As you can probably imagine, we’re honored and excited to have Marc involved at Fluidinfo.

How we made an API for BoingBoing in an evening

Thursday, January 27th, 2011

Yesterday the folks over at boingboing.net posted eleven year’s worth of posts as a zipped up XML file. XML is good, but having a searchable database of posts is better. So I (ntoll) am in the process of importing all the data into Fluidinfo. :-)

When finished, every post and author in the boingboing data dump will be represented by an object in Fluidinfo and tagged with useful information. The diagram below shows a representation of what a typical object about a boingboing.net post looks like:

Tags on an object representing a boingboing.net post.

The object (the red blob with a unique ID written inside it) has several tags attached to it (named “boingboing.net/author” and “boingboing.net/comment_count” for example) with associated values (“Mark Frauenfelder” and “53” respectively).

Furthermore, while I was cleaning/preparing the data for upload I made sure to extract every domain name and URL referenced in each post and annotate the publication date as computer friendly values rather than just a human readable date.

An instant win is the ability to query data. For example, you’ll be able to search for all posts that link to techcrunch.com written in 2010 by Cory Doctorow. This is how to write the query in Fluidinfo’s super simple query language:

boingboing.net/domains contains "techcrunch.com" and
boingboing.net/year = 2010 and
boingboing.net/author = "Cory Doctorow"

The result will depend on how you make the query, but let’s assume you’re using a /values based call in Fluidinfo’s REST api and you’ve asked for each post’s title, publication date and a list of domains mentioned. You’ll get back some JSON encoded data that looks something like this:

[
  "results" : {
        "id" : {
            "05eee31e-fbd1-43cc-9500-0469707a9bc3" : {
                "boingboing.net/title" : {
                    "value" : "This is a made up title for illustrative purposes"
                },
                "boingboing.net/created_on" : {
                    "value" : "2010-08-19 13:23:41"
                },
                "boingboing.net/domains" : {
                    "value": [
                        "techcrunch.com",
                        "microsoft.com"
                    ]
                }
            },
            "0521e31e-fbd1-43cc-9500-046974569bc3" : {
               … more results …
            }
        }
    }
  }
]
 


api

Wait a minute..!?!? This is just as if boingboing.net had an API.

Actually, by importing the flat XML file into Fluidinfo they do have an API – for free! Because of Fluidinfo’s open nature anyone can now make use of boingboing’s data via a few simple and easy to construct RESTful calls to Fluidinfo.

But that’s not all..!

Fluidinfo isn’t just openly readable – it’s openly writeable too.

Huh..?

Any user of Fluidinfo can tag data to any object. For example, I control a couple of tags called “ntoll/rating” and “ntoll/comment” which I could attach to any of the objects representing boingboing.net posts. By tagging an object with associated values I’m indicating what I thought about the post.

Importantly, I know which object I want to tag because it has a special unique tag called “about” whose value is the URL to the boingboing.net post in question. Other people who want to add information about this post will know to use the same object as me because the about tag-value tells them, er, what the object is about.

This brings me to the killer point: accessing data from boingboing.net is good, but the facility to annotate, discover and re-use everyone’s data about boingboing.net posts is better. That’s why we sometimes say we’re trying to do to databases what Wikipedia did to encyclopaedias.

Users of Fluidinfo don’t have to retrieve information about boingboing.net posts by building queries using just boingboing.net tags. It’s possible to search using other people’s tags. For example, here’s how to search for posts where I’ve given it a relatively high rating and added a comment:

ntoll/rating > 6 and has ntoll/comment and
has boingboing.net/title

And users don’t have to just ask for boingboing.net related tag-values either. It’s possible to ask objects for all their tags that you have permission to see. For example, you could retrieve a matching post’s title, body, author and any comments I make about the post with the ntoll/comment tag.

I’m only scratching the surface here so I’ll follow up with another post soon with some example code and use cases. In the meantime, if you want to find out more feel free to get in touch with us. We’re more than happy to help.

If you’re a developer and want to play with the boingboing.net data you should take a read of my last post explaining how to explore Fluidinfo’s API with Python.

In case you were wondering, it really was only half an evening’s work to prepare the data and write the import script. :-)

Note: The import is currently running but should be complete later this afternoon. Not all posts will be in Fluidinfo yet (so far we have everything up to the end of September 2008).

Image credits: Diagram generated by abouttag written by Nick Radcliffe and the “API Sign” is © 2006 ulybug under a Creative Commons license.

Importing data into FluidDB with Flimp

Friday, November 19th, 2010

We’d like to introduce you to “Flimp” (the FLuiddb IMPorter) – a tool that makes it easy to import data into FluidDB.

It works in two ways:

  1. Given a source file containing a data dump (in either json, yaml or csv format), Flimp will create the necessary FluidDB namespaces and tags and then import the records. (We expect to provide more file formats soon.)
  2. Given a filesystem path, Flimp will create the necessary FluidDB namespaces (based on directories) and tags (based on file names) and then import file contents as values tagged on a single FluidDB object.

Flimp can be configured to do custom pre-processing (e.g. cleaning, normalizing or modifying) before data is imported into FluidDB. It’s important to note that Flimp is in active development and that we welcome comments, ideas, and bug reports. Flimp is built on fom (the Fluid Object Mapper) created by my colleague Ali Afshar.

As a test, we’ve imported all the metadata from data.gov and data.gov.uk using Flimp and made it publicly readable. The rest of this article explains exactly how we did it so you can also start importing data into FluidDB using Flimp.

Open Government Data

Open linked government data

source: http://www.flickr.com/photos/opensourceway/4371001268/

Governments are making their data openly available to citizens. This has resulted in a tidal wave of hitherto unavailable information flowing onto the Internet.

Unfortunately, it’s very easy to be swamped by both the sheer amount and diversity of what is available. Furthermore, despite progress in this area, it is still difficult to search and explore the data. Plus, governments publish data in many different ways making it difficult to link, annotate and search datasets.

Both the US and UK government data sites provide a dump of their metadata (data describing the data they have available). Finding this invaluable information is hard, so for the record here’s a link to the US dump and here’s a link to the UK dump. These are the sources Flimp imported into FluidDB. No doubt there are more from other governments and when found they’ll also mysteriously find their way into FluidDB.

Get Flimp

Flimp is written in the Python programming language. You’ll need to have this installed first along with setuptools. Once you have these requirements there are two ways to get Flimp:

  1. If you want the latest and greatest “bleeding edge” version then go visit the project’s website and follow the appropriate links/instructions.
  2. If you’d rather use the current packaged stable release then follow the instructions below. The rest of this article deals with Flimp version 0.6.1.

To install the latest stable release open a terminal and issue the following commands (Flimp depends on fom and PyYaml):

$ easy_install fom
$ easy_install PyYaml
$ easy_install flimp

Once installed you can check Flimp has installed correctly by using the “flimp” command like this:

$ flimp --version
flimp 0.6.1

That’s it! You have both the “flimp” command line tool installed and the associated libraries used for importing data into FluidDB.

Help is always available via the command line tool:

$ flimp --help
Usage: flimp [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -f FILE, --file=FILE  The FILE to process (valid filetypes: .json, .csv,
                        .yaml)
  -d DIRECTORY, --dir=DIRECTORY
                        The root directory for a filesystem import into
                        FluidDB
  -u UUID, --uuid=UUID  The uuid of the object to which the filesystem import
                        is to attach its tags
  -a ABOUT, --about=ABOUT
                        The about value of the object to which the filesystem
                        import is to attach its tags
  -p, --preview         Show a preview of what will happen, don't import
                        anything
  -i INSTANCE, --instance=INSTANCE
                        The URI for the instance of FluidDB to use
  -l LOG, --log=LOG     The log file to write to (defaults to flimp.log)
  -v, --verbose         Display status messages to console
  -c, --check           Validate the data file containing the data to import
                        into FluidDB - don't import anything

Importing from data.gov.uk

First, we registered the user “data.gov.uk”. Because we’ll be using tags only associated with the data.gov.uk user you can be sure that the source of the data is legitimate. (We’d love this user to be under the control of someone from data.gov.uk – contact us if this applies to you.)

Next, we downloaded a json dump of the UK’s metadata. A quick look at the raw file indicated that it was already in a remarkably good state but we wanted to make sure. Flimp helps out:


$ flimp --file=uk_data_dump.json --check
Working... (this might take some time, why not: tail -f the log?)
The following MISSING fields were found:

geographical_granularity
temporal_coverage-from
temporal_coverage_to
geographic_granularity
temporal_coverage_from
taxonomy_url
import_source
temporal_coverage-to

Full details in the missing.json file

Flimp uses the first item in the json dump as a template for the schema. The “–check” flag tells Flimp to make sure all the items match the schema. In this case we notice that some items don’t have all the fields. This isn’t a problem and if we were to open the “missing.json” file then we’d see which items these are. Importantly, Flimp also checks if any of the items have extra fields associated with them. This would be more of an issue but Flimp would help by giving details of the problem items allowing you to rectify the problem.

It is also possible to preview what Flimp would do when importing the data:

$ flimp --file=uk_data_dump.json --preview
FluidDB username: data.gov.uk
FluidDB password:
Absolute Namespace path (under which imported namespaces and tags will be created): data.gov.uk/meta
Name of dataset (defaults to filename) [uk_data_dump]: data.gov.uk:metadata
Key field for about tag value (if none given, will use anonymous objects): id
Description of the dataset: Metadata from data.gov.uk
Working... (this might take some time, why not: tail -f the log?)
Preview of processing 'uk_data_dump.json'

The following namespaces/tags will be generated.

data.gov.uk/meta/relationships
data.gov.uk/meta/ratings_average
data.gov.uk/meta/maintainer
data.gov.uk/meta/name
data.gov.uk/meta/license
data.gov.uk/meta/author
data.gov.uk/meta/url
data.gov.uk/meta/notes
data.gov.uk/meta/title
data.gov.uk/meta/maintainer_email
data.gov.uk/meta/author_email
data.gov.uk/meta/state
data.gov.uk/meta/version
data.gov.uk/meta/resources
data.gov.uk/meta/groups
data.gov.uk/meta/ratings_count
data.gov.uk/meta/license_id
data.gov.uk/meta/revision_id
data.gov.uk/meta/id
data.gov.uk/meta/tags
data.gov.uk/meta/extras/national_statistic
data.gov.uk/meta/extras/geographic_coverage
data.gov.uk/meta/extras/geographical_granularity
data.gov.uk/meta/extras/external_reference
data.gov.uk/meta/extras/temporal_coverage-from
data.gov.uk/meta/extras/temporal_granularity
data.gov.uk/meta/extras/date_updated
data.gov.uk/meta/extras/agency
data.gov.uk/meta/extras/precision
data.gov.uk/meta/extras/geographic_granularity
data.gov.uk/meta/extras/temporal_coverage_to
data.gov.uk/meta/extras/temporal_coverage_from
data.gov.uk/meta/extras/taxonomy_url
data.gov.uk/meta/extras/import_source
data.gov.uk/meta/extras/temporal_coverage-to
data.gov.uk/meta/extras/department
data.gov.uk/meta/extras/update_frequency
data.gov.uk/meta/extras/date_released
data.gov.uk/meta/extras/categories

4023 records will be imported into FluidDB

The “–preview” flag does exactly what you’d expect: it asks you the same questions as if you were importing the data for real but instead lists the new namespace/tag combinations that will be created and the number of new objects to be annotated.

It’s important to understand how Flimp generates the “about” tag value (unsurprisingly, the about tag value indicates what each object in FluidDB is about). It needs to be unique and descriptive of what the object represents. As a result Flimp asks you to identify a field in your data containing unique values and appends this to the end of the name of the dataset (in the example above, “id” was identified as the key field):


fluiddb/about = "data.gov.uk:1ea4bfa9-9ae1-4be0-ae73-e0c4a26caa6c"

If you don’t provide a field for unique values Flimp simply generates a new object without an associated “about” value.

Nicholas Radcliffe’s About Tag blog is a great source of further information about the emerging conventions surrounding the “about” tag.

Since Flimp has satisfied us that the json data was in a good state we simply issued the following command to start the actual import:

$ flimp --file=uk_data_dump.json
FluidDB username: data.gov.uk
FluidDB password:
Absolute Namespace path (under which imported namespaces and tags will be created): data.gov.uk/meta
Name of dataset (defaults to filename) [uk_data_dump]: data.gov.uk:metadata
Key field for about tag value (if none given, will use anonymous objects): id
Description of the dataset: Metadata from data.gov.uk
Working... (this might take some time, why not: tail -f the log?)

Notice how Flimp interrogates you for sensitive information so you don’t have to have username/password credentials stored in a configuration file.

After the import completed it left a record of exactly what it did in the “flimp.log” file located in the current directory.

Importing from data.gov

Just as with the UK data, we’ve used an appropriate FluidDB username for importing the US data: data.gov (and the same applies – the data.gov user should be under the control of someone from data.gov – please contact us if this applies to you).

We took a different approach to the US metadata. They provide either an rdf document or a csv file. Since Flimp understands csv we used this as the source.

We wanted to make sure that the headers in the csv file (which get transformed into the names of tags in FluidDB) were cleaned and normalized appropriately since they contained lots of whitespace and non-alphanumeric characters. The snippet of Python code below demonstrates how we re-used Flimp in our own import script to achieve this end.

from flimp.utils import process_data_list
from flimp.parser import parse_csv
from fom.session import Fluid

def clean_header(header):
    """
    A function that takes a column header and normalises / cleans it into
    something we’ll use as the name of a tag
    "
""
    # remove leading/trailing whitespace, replace inline whitespace with
    # underscore and any slashes with dashes.
    return header.strip().replace(‘ ‘, ‘_’).replace(‘/’, ‘-‘)

csv_file = open("data_gov.csv", "r")
data = parse_csv.parse(csv_file, clean_header)

# data now contains the normalized input from the csv file

# Use fom to create a session with FluidDB – remember flimp uses fom for
# connecting to FluidDB
fdb = Fluid() # defines a session with FluidDB
fdb.login(‘data.gov’, ‘secretpassword123′) # replace these with something that works
fdb.bind()

root_path = ‘data.gov/meta’# Namespace where imported namespaces/tags are created
name = ‘data.gov:metadata’ # used when creating namespace/tag descriptions
desc = ‘Metadata from data.gov’ # a description of the dataset
about = ‘URL’ # field whose value to use for the about tag

# the following function call imports the data
result = process_data_list(data, root_path, name, desc, about)
print result
 

Conclusion

By importing the metadata into FluidDB we immediately gain the following:

  • FluidDB’s consistent, simple and elegant RESTful API as a view into the data.
  • The possibility of simple yet powerful queries across all the metadata.
  • The opportunity to annotate, link and augment the existing data with contributions from other sources.

Any application can now access the newly imported government data. In a future post I’ll demonstrate how to build a web-based interface for this data that is also hosted within FluidDB. I’ll also show how to query, annotate and link data yourself and re-use the contributions of others.

Coming soon to a FluidDB near you…

Monday, November 15th, 2010

Today (Monday 15th November) commencing from 10am GMT (11am Western Eurozone, 5am EST) the main instance of FluidDB will be offline for several hours while we roll out a major update.

We’re excited to announce the following new features and changes:

  • /about added to HTTP API – It will be possible to access FluidDB objects that have a fluiddb/about tag value with requests whose path starts with /about. For example, the object about “Barcelona” can be reached directly via /about/Barcelona. The behaviour of /about, when given an about value, is exactly like that of /objects when given an object id. More information will be available in the API docs at http://api.fluidinfo.com/. Many thanks to Holger Dürer (http://twitter.com/hd42) for suggesting this improvement.
  • /values added to HTTP API – It is now possible to manipulate multiple tag values in a single API request to /values via the PUT, GET and DELETE HTTP methods. From the user’s perspective, this will result in a significant improvement in performance. More information can be found in the API docs at http://api.fluidinfo.com/.
  • “SEE” permission replaced with “READ” – the permissions system has been simplified. FluidDB now uses only the READ permission on tags to decide whether API calls accessing the tag values should be allowed to proceed. Anything that used the SEE permission now uses READ. For example, when you do a GET on an object to retrieve the names of its tags, you will only receive those for which you have READ permission. Many thanks to Jamu Kakar (http://twitter.com/jkakar) for suggesting this simplification.
  • Deleting a tag instance now always returns an HTTP 204 (No content) code – DELETEing a tag value from an object that did not have that tag used to result in a “404 (Not found)” status. This will be changed to simply return the non-error “204 (No Content)”.
  • “Content-MD5″ header for checking payload content – It will be possible to send a checksum of a payload to FluidDB via the “Content-MD5″ header. FluidDB will attempt to validate the checksum with the payload and return a “412 (Precondition failed)” status in the case of a mismatch.
  • Cross Origin Resource Sharing (CORS) added to HTTP API – it will be possible to make cross origin requests as specified by http://www.w3.org/TR/cors/ rather than rely on JSONP. FluidDB will have an almost complete implementation of this emerging standard although we expect to make changes and improvements as the specification matures.
  • Text indexing of fluiddb/about tag values – text indexing is coming to FluidDB but is definitely a work in progress. This release is just the very first step: the fluiddb/about tag will be indexed from the update onwards (existing fluiddb/about tag values will be indexed over the coming days/weeks).

For those of you who have written or maintain a client library for FluidDB we’d like to refer you to the changes we’ve made to the Fluid Object Mapper (FOM) library as a reference for what you might want to do with your own library.

To encourage people to add the new FluidDB capabilities to libraries, we’re going to extend the FluidDB Weekend of Code offer to library authors. Let us know when you’re working on your library and where we can find it (Github, Bitbucket, Sourceforge etc) and we’ll order you a pizza and send you a book of your choice from Amazon.

Finally, we’re moving to a four-week development cycle so expect regular updates, pro-active bug squashing and lots of progress in the coming months. We’ve got lots of exciting stuff in the pipeline and we can’t wait to see how the FluidDB community reacts.

Fluidinfo receives an additional $170K in Series A second closing

Monday, August 23rd, 2010

We’re happy to announce a second closing of Fluidinfo‘s Series A investment round. We’ve raised another $170K, taking the round to just under $1M in total. Some of the people investing in the second closing are:

Michael Parekh: A Wall Streeter for over 20 years and former partner at Goldman Sachs, Michael founded and helped to build the Internet Research effort at the firm (twitter, more info).

Esther Dyson: who seed funded Fluidinfo in late 2007, and who’s been a huge source of help ever since (and before). We’re thrilled to have her following on in this round (twitter, more info).

David Snow: the Editor in Chief of PEI Media. David also participated in the seed funding of Fluidinfo (twitter, more info).

Ted Carroll and Earl Macomber: who were both also seed-stage backers of Fluidinfo. Ted and Earl are the managing principals of traditional information and media focused private equity firm Noson Lawen Partners, and have again made personal investments (twitter, more info).

Ed Carroll: who was also a seed stage Fluidinfo investor. Ed is now entering his senior year in high school and hopes to attend USC next year as a freshman at The Marshall School. Ed spent a month at Marshall this summer and walked away with Top Five honors in their Entrepreneurialism program. Good luck Ed!

There are three other new investors who also came into the round, but who prefer not to be mentioned publicly at this stage (so you’ll have to ask us about them privately :-)). The above all join Betaworks, IA Ventures, RRE Ventures, Lerer Ventures, Chris Dixon & Founder Collective, Joshua Schacter, Andrew Rasiej, Ross Williams, and Esther Speight as Fluidinfo Series A investors.

Our thanks to everyone!

FluidDB enters alpha

Monday, May 24th, 2010

We’re using the Techcrunch Disrupt event to launch FluidDB into a real alpha. Until today we’ve only let a small number of people in to play with the API, and we’ve been giving away API passwords by hand. As of today, we’re taking the brakes off a little, allowing anyone to sign up and begin using the FluidDB API. Of course to do that it will help enormously if you’re a programmer :-)

Although FluidDB has been up and running for 9 months, we’re being careful not to raise expectations too quickly. So for now we’re still labeling it an “alpha”. We have concrete plans for what will constitute a beta—these are mainly to do with speed and with adding flexibility to the API to reduce the number of calls apps have to make—and plan to be in beta by the end of 2010. Now that we have our funding cleared up, and can hire more developers, you can expect FluidDB development to ramp up quickly.

Please feel free to comment below. We’re listening!

Fluidinfo is a TechCrunch Disrupt finalist

Monday, May 24th, 2010

Fluidinfo has been selected as a finalist in the TechCrunch Disrupt Startup Battlefield taking place today in New York.

Over 500 companies from around the world applied to present at TechCrunch Disrupt, and only 20 were accepted. We’ll be on stage fighting for the right to call ourselves the most disruptive start-up on the planet :-)

It’s quite an accomplishment and an honor to be selected as a finalist. The entry process wasn’t simple: a written application, a 5-minute video, a phone interview with TC CEO Heather Harde, a couple of hours talking and demoing to Erick Schonfeld, a written script of a presentation (with lots of suggestions from Erick), and several live rehearsals. Ben Siscovick of IA Ventures is kindly helping with the live presentation. And along the way I had to reluctantly pull out of giving a presentation at the NY Tech Meetup. Thanks to John Borthwick of Betaworks and to Todd Levy of bit.ly for helping cover for me, and especially to Nate Westheimer the NYTM organizer for his understanding and support.

I’ve never really liked these startup competitions. They amount of time allotted to present always seems too little, and the audience too general. But more importantly, they’ve always seemed biased towards startups working on much simpler things, with snazzy UIs and demos – exactly the kind of thing we never had. But the theme of TechCrunch Disrupt was too irresistible to ignore. In the demo video I submitted, I told them I had no demo and that real disruption will not necessarily arrive with a UI. To their credit, TC bought it and were courageous enough to consider Fluidinfo further, and to finally accept us. Erick Schonfeld was very thoughtful, supportive, and encouraging in this.

Hopefully the presentation will be available online – if so I’ll post a link here. I’ve been thinking about it and working on it for some time. I’m on stage sometime after 2:15pm (EST) today.

I think it’s going to go well. Hopefully by the time you read this we wont have already been voted off the TC Disrupt island!