Fluidinfo

November 23, 2010

Watering a Peace Lily with Fluidinfo

Filed under: Awesomeness,Programming — Nicholas Tollervey @ 9:54 am

I (ntoll) belong to a nascent hackerspace called NortHACKton. It’s an opportunity to learn new skills and to collaborate with a great bunch of people who create cool stuff. I’m going to describe just such a collaboration with Stephen Bridges, one of the organisers of the hackerspace.

Our aim was to combine a simple hardware project with Fluidinfo and do it in such a way that others could repeat, extend and enhance what we’d been up to. We decided to connect an Arduino to a sensor and put the resulting reading into Fluidinfo at regular intervals. In the end we built something to make a moisture reading of the soil in Stephen’s plant pot and update a value in Fluidinfo every 10 minutes.

The Arduino has an Ethernet shield so the device can communicate autonomously with Fluidinfo via the HTTP API. The support circuitry is adapted from Botanicalls.com (Creative Commons) and Stephen created the sensor from a pen lid, sticky tape and a couple of wires. 🙂

The source code can be found on GitHub and contains two parts:

  1. A generic and reusable layer that handles basic interaction with Fluidinfo
  2. The application logic that takes the reading and controls the Arduino.

From Fluidinfo’s point of view, there is an object that represents Stephen’s peace lily (its about tag value is “Stephen’s Peace Lily (houseplant)”) and the tag widget/ffm/reading attached to this object is updated with the appropriate value.

Interestingly, I’ve also added some tags to the object representing the peace lily which hold html, css and javascript values. This is a classic case of putting information in context since the peace lily’s web page is a tag-value attached to its object in Fluidinfo. So it’s possible to view the peace lily’s current status with your browser.

The whole thing was hacked together in an afternoon over a drink in a pub in Northampton. Unfortunately for Stephen my mobile phone takes video so I press-ganged him into the following explanation:

You can find Stephen’s write-up on the NortHACKton wiki. If you’re interested in doing something similar with Fluidinfo please don’t hesitate to drop in on our IRC channel (#fluidinfo on Freenode – connect via the web) and ask questions. Alternatively, drop by either the fluidinfo-users or fluidinfo-discuss mailing lists. We’d be more than happy to help.

November 19, 2010

Importing data into FluidDB with Flimp

Filed under: Programming,Progress — Nicholas Tollervey @ 5:26 am

We’d like to introduce you to “Flimp” (the FLuiddb IMPorter) – a tool that makes it easy to import data into FluidDB.

It works in two ways:

  1. Given a source file containing a data dump (in either json, yaml or csv format), Flimp will create the necessary FluidDB namespaces and tags and then import the records. (We expect to provide more file formats soon.)
  2. Given a filesystem path, Flimp will create the necessary FluidDB namespaces (based on directories) and tags (based on file names) and then import file contents as values tagged on a single FluidDB object.

Flimp can be configured to do custom pre-processing (e.g. cleaning, normalizing or modifying) before data is imported into FluidDB. It’s important to note that Flimp is in active development and that we welcome comments, ideas, and bug reports. Flimp is built on fom (the Fluid Object Mapper) created by my colleague Ali Afshar.

As a test, we’ve imported all the metadata from data.gov and data.gov.uk using Flimp and made it publicly readable. The rest of this article explains exactly how we did it so you can also start importing data into FluidDB using Flimp.

Open Government Data

Open linked government data

source: http://www.flickr.com/photos/opensourceway/4371001268/

Governments are making their data openly available to citizens. This has resulted in a tidal wave of hitherto unavailable information flowing onto the Internet.

Unfortunately, it’s very easy to be swamped by both the sheer amount and diversity of what is available. Furthermore, despite progress in this area, it is still difficult to search and explore the data. Plus, governments publish data in many different ways making it difficult to link, annotate and search datasets.

Both the US and UK government data sites provide a dump of their metadata (data describing the data they have available). Finding this invaluable information is hard, so for the record here’s a link to the US dump and here’s a link to the UK dump. These are the sources Flimp imported into FluidDB. No doubt there are more from other governments and when found they’ll also mysteriously find their way into FluidDB.

Get Flimp

Flimp is written in the Python programming language. You’ll need to have this installed first along with setuptools. Once you have these requirements there are two ways to get Flimp:

  1. If you want the latest and greatest “bleeding edge” version then go visit the project’s website and follow the appropriate links/instructions.
  2. If you’d rather use the current packaged stable release then follow the instructions below. The rest of this article deals with Flimp version 0.6.1.

To install the latest stable release open a terminal and issue the following commands (Flimp depends on fom and PyYaml):

$ easy_install fom
$ easy_install PyYaml
$ easy_install flimp

Once installed you can check Flimp has installed correctly by using the “flimp” command like this:

$ flimp --version
flimp 0.6.1

That’s it! You have both the “flimp” command line tool installed and the associated libraries used for importing data into FluidDB.

Help is always available via the command line tool:

$ flimp --help
Usage: flimp [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -f FILE, --file=FILE  The FILE to process (valid filetypes: .json, .csv,
                        .yaml)
  -d DIRECTORY, --dir=DIRECTORY
                        The root directory for a filesystem import into
                        FluidDB
  -u UUID, --uuid=UUID  The uuid of the object to which the filesystem import
                        is to attach its tags
  -a ABOUT, --about=ABOUT
                        The about value of the object to which the filesystem
                        import is to attach its tags
  -p, --preview         Show a preview of what will happen, don't import
                        anything
  -i INSTANCE, --instance=INSTANCE
                        The URI for the instance of FluidDB to use
  -l LOG, --log=LOG     The log file to write to (defaults to flimp.log)
  -v, --verbose         Display status messages to console
  -c, --check           Validate the data file containing the data to import
                        into FluidDB - don't import anything

Importing from data.gov.uk

First, we registered the user “data.gov.uk”. Because we’ll be using tags only associated with the data.gov.uk user you can be sure that the source of the data is legitimate. (We’d love this user to be under the control of someone from data.gov.uk – contact us if this applies to you.)

Next, we downloaded a json dump of the UK’s metadata. A quick look at the raw file indicated that it was already in a remarkably good state but we wanted to make sure. Flimp helps out:


$ flimp --file=uk_data_dump.json --check
Working... (this might take some time, why not: tail -f the log?)
The following MISSING fields were found:

geographical_granularity
temporal_coverage-from
temporal_coverage_to
geographic_granularity
temporal_coverage_from
taxonomy_url
import_source
temporal_coverage-to

Full details in the missing.json file

Flimp uses the first item in the json dump as a template for the schema. The “–check” flag tells Flimp to make sure all the items match the schema. In this case we notice that some items don’t have all the fields. This isn’t a problem and if we were to open the “missing.json” file then we’d see which items these are. Importantly, Flimp also checks if any of the items have extra fields associated with them. This would be more of an issue but Flimp would help by giving details of the problem items allowing you to rectify the problem.

It is also possible to preview what Flimp would do when importing the data:

$ flimp --file=uk_data_dump.json --preview
FluidDB username: data.gov.uk
FluidDB password:
Absolute Namespace path (under which imported namespaces and tags will be created): data.gov.uk/meta
Name of dataset (defaults to filename) [uk_data_dump]: data.gov.uk:metadata
Key field for about tag value (if none given, will use anonymous objects): id
Description of the dataset: Metadata from data.gov.uk
Working... (this might take some time, why not: tail -f the log?)
Preview of processing 'uk_data_dump.json'

The following namespaces/tags will be generated.

data.gov.uk/meta/relationships
data.gov.uk/meta/ratings_average
data.gov.uk/meta/maintainer
data.gov.uk/meta/name
data.gov.uk/meta/license
data.gov.uk/meta/author
data.gov.uk/meta/url
data.gov.uk/meta/notes
data.gov.uk/meta/title
data.gov.uk/meta/maintainer_email
data.gov.uk/meta/author_email
data.gov.uk/meta/state
data.gov.uk/meta/version
data.gov.uk/meta/resources
data.gov.uk/meta/groups
data.gov.uk/meta/ratings_count
data.gov.uk/meta/license_id
data.gov.uk/meta/revision_id
data.gov.uk/meta/id
data.gov.uk/meta/tags
data.gov.uk/meta/extras/national_statistic
data.gov.uk/meta/extras/geographic_coverage
data.gov.uk/meta/extras/geographical_granularity
data.gov.uk/meta/extras/external_reference
data.gov.uk/meta/extras/temporal_coverage-from
data.gov.uk/meta/extras/temporal_granularity
data.gov.uk/meta/extras/date_updated
data.gov.uk/meta/extras/agency
data.gov.uk/meta/extras/precision
data.gov.uk/meta/extras/geographic_granularity
data.gov.uk/meta/extras/temporal_coverage_to
data.gov.uk/meta/extras/temporal_coverage_from
data.gov.uk/meta/extras/taxonomy_url
data.gov.uk/meta/extras/import_source
data.gov.uk/meta/extras/temporal_coverage-to
data.gov.uk/meta/extras/department
data.gov.uk/meta/extras/update_frequency
data.gov.uk/meta/extras/date_released
data.gov.uk/meta/extras/categories

4023 records will be imported into FluidDB

The “–preview” flag does exactly what you’d expect: it asks you the same questions as if you were importing the data for real but instead lists the new namespace/tag combinations that will be created and the number of new objects to be annotated.

It’s important to understand how Flimp generates the “about” tag value (unsurprisingly, the about tag value indicates what each object in FluidDB is about). It needs to be unique and descriptive of what the object represents. As a result Flimp asks you to identify a field in your data containing unique values and appends this to the end of the name of the dataset (in the example above, “id” was identified as the key field):


fluiddb/about = "data.gov.uk:1ea4bfa9-9ae1-4be0-ae73-e0c4a26caa6c"

If you don’t provide a field for unique values Flimp simply generates a new object without an associated “about” value.

Nicholas Radcliffe’s About Tag blog is a great source of further information about the emerging conventions surrounding the “about” tag.

Since Flimp has satisfied us that the json data was in a good state we simply issued the following command to start the actual import:

$ flimp --file=uk_data_dump.json
FluidDB username: data.gov.uk
FluidDB password:
Absolute Namespace path (under which imported namespaces and tags will be created): data.gov.uk/meta
Name of dataset (defaults to filename) [uk_data_dump]: data.gov.uk:metadata
Key field for about tag value (if none given, will use anonymous objects): id
Description of the dataset: Metadata from data.gov.uk
Working... (this might take some time, why not: tail -f the log?)

Notice how Flimp interrogates you for sensitive information so you don’t have to have username/password credentials stored in a configuration file.

After the import completed it left a record of exactly what it did in the “flimp.log” file located in the current directory.

Importing from data.gov

Just as with the UK data, we’ve used an appropriate FluidDB username for importing the US data: data.gov (and the same applies – the data.gov user should be under the control of someone from data.gov – please contact us if this applies to you).

We took a different approach to the US metadata. They provide either an rdf document or a csv file. Since Flimp understands csv we used this as the source.

We wanted to make sure that the headers in the csv file (which get transformed into the names of tags in FluidDB) were cleaned and normalized appropriately since they contained lots of whitespace and non-alphanumeric characters. The snippet of Python code below demonstrates how we re-used Flimp in our own import script to achieve this end.

from flimp.utils import process_data_list
from flimp.parser import parse_csv
from fom.session import Fluid

def clean_header(header):
    """
    A function that takes a column header and normalises / cleans it into
    something we'll use as the name of a tag
    """
    # remove leading/trailing whitespace, replace inline whitespace with
    # underscore and any slashes with dashes.
    return header.strip().replace(' ', '_').replace('/', '-')

csv_file = open("data_gov.csv", "r")
data = parse_csv.parse(csv_file, clean_header)

# data now contains the normalized input from the csv file

# Use fom to create a session with FluidDB - remember flimp uses fom for
# connecting to FluidDB
fdb = Fluid() # defines a session with FluidDB
fdb.login('data.gov', 'secretpassword123') # replace these with something that works
fdb.bind()

root_path = 'data.gov/meta'# Namespace where imported namespaces/tags are created
name = 'data.gov:metadata' # used when creating namespace/tag descriptions 
desc = 'Metadata from data.gov' # a description of the dataset
about = 'URL' # field whose value to use for the about tag

# the following function call imports the data
result = process_data_list(data, root_path, name, desc, about)
print result

Conclusion

By importing the metadata into FluidDB we immediately gain the following:

  • FluidDB’s consistent, simple and elegant RESTful API as a view into the data.
  • The possibility of simple yet powerful queries across all the metadata.
  • The opportunity to annotate, link and augment the existing data with contributions from other sources.

Any application can now access the newly imported government data. In a future post I’ll demonstrate how to build a web-based interface for this data that is also hosted within FluidDB. I’ll also show how to query, annotate and link data yourself and re-use the contributions of others.

November 15, 2010

Coming soon to a FluidDB near you…

Filed under: Awesomeness,Happiness,Programming,Progress — Tags: — Nicholas Tollervey @ 4:51 am

Today (Monday 15th November) commencing from 10am GMT (11am Western Eurozone, 5am EST) the main instance of FluidDB will be offline for several hours while we roll out a major update.

We’re excited to announce the following new features and changes:

  • /about added to HTTP API – It will be possible to access FluidDB objects that have a fluiddb/about tag value with requests whose path starts with /about. For example, the object about “Barcelona” can be reached directly via /about/Barcelona. The behaviour of /about, when given an about value, is exactly like that of /objects when given an object id. More information will be available in the API docs at http://api.fluidinfo.com/. Many thanks to Holger Dürer (http://twitter.com/hd42) for suggesting this improvement.
  • /values added to HTTP API – It is now possible to manipulate multiple tag values in a single API request to /values via the PUT, GET and DELETE HTTP methods. From the user’s perspective, this will result in a significant improvement in performance. More information can be found in the API docs at http://api.fluidinfo.com/.
  • “SEE” permission replaced with “READ” – the permissions system has been simplified. FluidDB now uses only the READ permission on tags to decide whether API calls accessing the tag values should be allowed to proceed. Anything that used the SEE permission now uses READ. For example, when you do a GET on an object to retrieve the names of its tags, you will only receive those for which you have READ permission. Many thanks to Jamu Kakar (http://twitter.com/jkakar) for suggesting this simplification.
  • Deleting a tag instance now always returns an HTTP 204 (No content) code – DELETEing a tag value from an object that did not have that tag used to result in a “404 (Not found)” status. This will be changed to simply return the non-error “204 (No Content)”.
  • “Content-MD5” header for checking payload content – It will be possible to send a checksum of a payload to FluidDB via the “Content-MD5” header. FluidDB will attempt to validate the checksum with the payload and return a “412 (Precondition failed)” status in the case of a mismatch.
  • Cross Origin Resource Sharing (CORS) added to HTTP API – it will be possible to make cross origin requests as specified by http://www.w3.org/TR/cors/ rather than rely on JSONP. FluidDB will have an almost complete implementation of this emerging standard although we expect to make changes and improvements as the specification matures.
  • Text indexing of fluiddb/about tag values – text indexing is coming to FluidDB but is definitely a work in progress. This release is just the very first step: the fluiddb/about tag will be indexed from the update onwards (existing fluiddb/about tag values will be indexed over the coming days/weeks).

For those of you who have written or maintain a client library for FluidDB we’d like to refer you to the changes we’ve made to the Fluid Object Mapper (FOM) library as a reference for what you might want to do with your own library.

To encourage people to add the new FluidDB capabilities to libraries, we’re going to extend the FluidDB Weekend of Code offer to library authors. Let us know when you’re working on your library and where we can find it (Github, Bitbucket, Sourceforge etc) and we’ll order you a pizza and send you a book of your choice from Amazon.

Finally, we’re moving to a four-week development cycle so expect regular updates, pro-active bug squashing and lots of progress in the coming months. We’ve got lots of exciting stuff in the pipeline and we can’t wait to see how the FluidDB community reacts.

July 29, 2010

Top tweeters as followed by HN readers now in FluidDB

Filed under: Programming — Terry Jones @ 5:57 pm

Yesterday Jeff Miller posted some interesting data on the Twitter users most followed by readers of Hacker News.

I just took those top 100 Tweeters and added Jeff’s data (their rank and the fraction of HN readers who follow them) to FluidDB. The tags I used in FluidDB are ycombinator.com/top-100 and ycombinator.com/follow-percent. The top-100 tag has values that are the Twitter user’s rank (from 1 to 100), and the follower-percent tag holds the (floating point) percentage of Hacker News readers that follow that Twitter user, as found by Jeff.

What does this all mean?

It means you can now query on Jeff’s data using FluidDB. And because FluidDB contains various other pieces of information about Twitter users, you can combine his data with other data in searches – including searches that Jeff probably never anticipated (and, because of FluidDB, never had to anticipated).

It also mean you can add to the data too. All you need is a FluidDB account (sign up) and then you can take the FluidDB API for a spin (docs).

To see the kinds of things that are possible, you can also do some queries using the advanced tab of Tickery.

For example, Who are more than 20.0 percent of HN readers following that have a TunkRank score of at least 60?

Or, Who is in the HN top 100 that I have met?

Or, Who of the top 100 do I follow?

The possibilities are endless. The main point of FluidDB is that you can play too. You can add your own data (any data) to the exact same objects that I’ve put Jeff’s data onto and which Tickery and TunkRank and We Met At are all using – and you don’t have to ask permission.

We’ve written plenty more on this subject. See also Tickery, for programmers, TunkRank scores added to FluidDB, Putting metadata onto tweets with FluidDB and FluidDB as a universal metadata engine.

You can get all the code I used to put the data into FluidDB from our hackernews repo on GitHub. It was about 90 minutes of work from start to finish.

Have fun, and please comment below!

July 20, 2010

Open sourcing Tickery

Filed under: Programming — Terry Jones @ 6:23 am

TickeryToday we’re excited to announce that we’ve open sourced Tickery under the Apache License. You can download the source from the Fluidinfo repository on Github. If you’re not familiar with Tickery, you can go play with it and also read our two blog posts, Meet Tickery and Tickery, for programmers.

We’ve open sourced Tickery in order to show other developers the insides of a non-trivial application that uses FluidDB. Tickery was written over a three month period (November 2009 to January 2010), and much of it was done at a fairly fast pace. While the code could be cleaner and better documented, it’s not bad. We’re of course interested to help people understand the code, so please feel free to join the FluidDB users mailing list, or join us in #fluiddb on irc.freenode.net. Naturally we’ll be happy and interested to receive improvements or patches, and you can of course run your own instance of Tickery.

Tickery is written entirely in Python, and was built using a number of other open-source tools, including Twisted, Pyjamas, txFluidDB, txRDQ, txJSON-RPC, and Ply. Thanks to all those projects for their openness and support.

We also had the benefit of lots of help from Luke Leighton and the other Pyjamas developers – thanks!

May 24, 2010

FluidDB enters alpha

Filed under: Programming,Progress — Terry Jones @ 11:38 am

We’re using the Techcrunch Disrupt event to launch FluidDB into a real alpha. Until today we’ve only let a small number of people in to play with the API, and we’ve been giving away API passwords by hand. As of today, we’re taking the brakes off a little, allowing anyone to sign up and begin using the FluidDB API. Of course to do that it will help enormously if you’re a programmer 🙂

Although FluidDB has been up and running for 9 months, we’re being careful not to raise expectations too quickly. So for now we’re still labeling it an “alpha”. We have concrete plans for what will constitute a beta—these are mainly to do with speed and with adding flexibility to the API to reduce the number of calls apps have to make—and plan to be in beta by the end of 2010. Now that we have our funding cleared up, and can hire more developers, you can expect FluidDB development to ramp up quickly.

Please feel free to comment below. We’re listening!

January 21, 2010

Tickery, for programmers

Filed under: Essence,Programming — Terry Jones @ 5:21 pm

Where's the beef?If you’re a programmer and you’ve played around with Tickery, it should be clear that Tickery is functionally very simple when looked at from a traditional database perspective. Tickery looks like an application in its own right. It tries to offer something so simple that anyone (at least any Twitter user) can understand and use the Simple “Enter two Twitter usernames” functionality. But we actually designed and built Tickery mainly as a demo of what’s possible with Fluidinfo (description, API). So here are some first details on how Tickery uses Fluidinfo, and how you can use it too.

Fluidinfo objects

The most important thing to understand about Fluidinfo initially is that it maintains a collection of tagged objects that are not owned. Tags have permissions and a tag on an object can optionally have a value associated with it. Here’s a conceptual view of an object that has two tags on it that were added by Tickery.

The long identifier is the object’s unique id in Fluidinfo. The left column shows tag names, such as twitter.com/users/screen_name, and the right column shows the value (if any) of the tag on the object. Because objects in Fluidinfo are not owned, anyone (which is to say any application) can put additional tags onto this object. I’m going to ignore permissions in what follows – that’s a subject for a separate posting.

Any application can find this object in Fluidinfo, using a simple query, like twitter.com/users/id = 42983 or twitter.com/users/screen_name = "terrycojones".

Now let’s suppose Tickery adds @esteve to Fluidinfo, and wants to indicate that Esteve currently follows terrycojones. Tickery creates a new tag, twitter.com/friends/esteve in Fluidinfo, and adds it to the above object. The object then looks like this:

Similarly, Tickery adds a twitter.com/friends/esteve to the objects representing all the Twitter users Esteve follows. At this point it is easy to retrieve all those users via the Fluidinfo query has twitter.com/friends/esteve (i.e., get me all the objects that have a twitter.com/friends/esteve tag, irrespective of the tag’s value, if any).

Suppose Tickery now adds another Twitter user, @fergusstothart who currently follows me. It adds another tag to the object, resulting in

and also puts a twitter.com/friends/fergusstothart tag onto the objects for the other users that Fergus follows. It finds these objects via a Fluidinfo query (using the Twitter id of Fergus, obtained from the Twitter API). If Tickery needs to tag an object for a user that it hasn’t created yet, it simply creates a new object for that user, and tags it.

Given the above, we’ve seen enough to know how Tickery does most of its work. For example, getting things like the set of people Esteve and Fergus follow in common is just an and query has twitter.com/friends/esteve and has twitter.com/friends/fergusstothart, or the set of people Esteve follows but Fergus does not has twitter.com/friends/esteve except has twitter.com/friends/fergusstothart, etc.

Where we come in

That’s all well and good, but it’s all about Tickery. What if other programmers, who perhaps don’t care or even know about Tickery want to add data and search on it too? In a normal database or with a normal application, you’d probably expect to have to ask permission. Then, supposing it was granted, you could only do the kinds of things that had been anticipated and provided for.

But in Fluidinfo it’s completely different. Any application can come along and put whatever it likes onto the above object, or any other object (that it can find). As a trivial example, Esteve and I have also added tags to the Fluidinfo objects to indicate which of the people we follow we have also met in person. Esteve has an esteve/met tag, and because we’ve met (in fact we built Fluidinfo together), he has put that tag onto the above object:

Think about what just happened. An unknown 3rd party (well, let’s pretend Esteve was unknown) just came along, sometime after the Tickery data already existed in Fluidinfo, and added something completely new and unanticipated to an existing object, without asking for permission, and without in any way disturbing the original content. Esteve, or anyone else who can read his tag, can now do interesting searches, like has twitter.com/friends/esteve except has esteve/met, which shows the people he follows but has not yet met. Further, his searches seamlessly combine the existing Tickery data with his own data, and could also include other tags that other applications add.

That kind of unanticipated use of information, flexibility in representation and search, and change of control is what’s at the heart of Fluidinfo.

Twitter lists

If you think about it, Esteve adding esteve/met tags to the objects for Twitter users is exactly like making a list in Twitter using their new lists function. But it’s more useful for two main reasons.

Firstly, you can query across lists, e.g., has terrycojones/met and has esteve/met will find people we have both met in person, and (has twitter.com/friends/esteve and has terrycojones/met) except has esteve/met will find people Esteve follows that I have met in person but whom he has not. As you can see, querying on lists makes them much more useful.

Secondly, you can use the Fluidinfo permissions system to control who can see or read your tags. So it’s not only possible to have a completely private list, a public one, but you can also have a list that’s visible just to some friends, or one that you let certain other people add to (by giving them write permission on the tag involved).

Permissions in Fluidinfo are very simple and very flexible, and because they apply at the level of the tag (not the object), you can control who can do what to individual pieces (tags) of an object. That’s a subject for another post, as I mentioned. You might like to have a read of the Fluidinfo permissions docs and/or check out Nicholas Radcliffe‘s post Permissions Worth Getting Excited About, plus see the comment by Nicholas Tollervey, who writes Fluidinfo’s "killer feature" is actually its permissions system and the implications thereof. It is so important that I’ll save that topic for its own blog post later.

More Tickery tags

Tickery also saves a few more Twitter user details onto objects in Fluidinfo. The object above has some additional tags:

You can put these into queries too, of course. The Tickery Advanced tab lets you type them in, e.g., I can see which of the people that Jack follows are very popular has twitter.com/friends/jack and twitter.com/users/followers_count > 100000.

Running on ahead of Tickery

Finally, here’s a subtle but very important point. What if you write an application that uses Fluidinfo to store data, and you want it to interact fully with Tickery, but you want to store information about a user that Tickery doesn’t know about yet?

This is crucially important because it’s about information control, and if control is completely in the hands of Tickery, other developers will be less likely to want to add information. Exactly this scenario plays out in many domains: e.g., suppose Amazon released something that let you indicate which books you own, but that you own things that are not in the Amazon database. How can you run on ahead of Amazon to insert your data before they create an object for the book, if ever? How can you do it in a way so that when they finally do create the book your data and theirs are seamlessly joined without anyone having to lift a finger or even be aware of the other party?

This is one area in which the special Fluidinfo “about” tag (full name fluiddb/about) makes all the difference. You can read about it here, and be sure to check out Nicholas Radcliffe‘s blog which is titled, not coincidentally, About Tag.

Tickery uses the Fluidinfo about tag to hold a Twitter user id, like this:

There’s a ton that could be written about this. Very briefly, the about tag is immutable and can only be set on an object when it’s created (in fact the about tag shown above was put there when the object was made). So, if you want to add data to Twitter user that Tickery hasn’t gotten to, just look up the user’s Twitter id (say XXXX) with the Twitter API, create the object in Fluidinfo with about tag twitter.com:uid:XXXX, and put your tags onto that object. If Fluidinfo doesn’t have an object with that about tag, it will make one for you. When/if Tickery gets around to adding its information for that user, it will put it in the same place. Magic.

Convenience API

As a convenience, though note that it’s optional and it’s use is up to you, Tickery provides a small API that you can use to have it put its twitter.com/users/screen_name and twitter.com/users/id tags onto objects for you and give you the Fluidinfo object id its using for a user.

E.g., if you do an HTTP GET on http://tickery.net/api/screennames/terrycojones you’ll see the object id from our examples above. Or if you happened to know my Twitter user id (42983) via the Twitter API, you could do a GET on http://tickery.net/api/uids/42983 and receive the same thing.

Truly social data

This API is just for convenience. Tickery uses the about tag in order to be able to share Fluidinfo objects with other apps – including apps that want to add information about a user that Tickery has not gotten to yet. Just like Fluidinfo, Tickery wants to encourage what we call Truly Social Data. Tickery doesn’t place itself in the center, doesn’t make its data more important than anyone else’s, and doesn’t act as a gatekeeper.

In fact, it gets even better: a normal user can turn around and stop Tickery from reading the data that Tickery stored on the user’s behalf. That’s as it should be. Users should have control over their data, and a choice of application shouldn’t result in lock in.

Getting access

Fluidinfo is still very new, and we’re in a private alpha phase. If you’d like to use the API, there are two steps: 1) reserve a username, and 2) send us email mentioning the name you reserved and a line or two about what you’d like to do. We apologize for this early restriction – please rest assured that we’re planning to open Fluidinfo up to everyone before too long. That’s the whole point.

More soon. Thanks very much for reading!

December 1, 2009

Putting metadata onto tweets with Fluidinfo

Filed under: Essence,Programming — Terry Jones @ 3:37 pm

novaVarious articles have recently discussed adding metadata to Twitter tweets – see the posts by Nova Spivack, Robert Scoble, and Dave Winer (who also suggests we need a programming language built into a Twitter client).

These are the sorts of things that Fluidinfo was designed to support, and you can do them today. If you want a password to start playing with the Fluidinfo API, send email to api at fluidinfo dot com and we’ll set you up.

In the meantime, here are some examples. I’m doing this at the iPython command line, using the Fluid Object Mapper (FOM) library, written by Ali Afshar. FOM provides a natural way to work with Fluidinfo objects, namespaces, tags, etc. But you could use any client-side software you like. The Fluidinfo API is just HTTP.

First, let’s get a connection to Fluidinfo:

from fom.session import Fluid

fdb = Fluid()
fdb.db.client.login('terrycojones', 'PASSWORD')
fdb.bind()

That last line is a bit of internal FOM magic that makes interactive use simpler in what follows. Ignore it for now.

To put metadata onto a tweet, we’ll first ask Fluidinfo for the object that’s about a particular tweet. Let’s take the one in the image above by @novaspivack. That tweet has a URL of http://twitter.com/novaspivack/status/4999653280. We ask Fluidinfo to give us the object “about” that URL:

from fom.mapping import Object

o = Object()
o.create('http://twitter.com/novaspivack/status/4999653280')
o.uid
>>> u'ab7fa032-06df-45be-9bb2-859c18c4d342'

The argument in the o.create call is the value of the Fluidinfo about tag. If an object with that about tag already exists, Fluidinfo gives it to us. Otherwise, a new object with that about tag is created. As you can see, the object also has an identifier (o.uid). In case you’re not familiar with Python, the “u” printed in front of the id indicates that the value is a unicode string.

This is a first point of interest. We’ve just created a Fluidinfo object corresponding to an arbitrary tweet. We didn’t ask for permission, we just did it. It’s a bit like a wiki: you can ask a wiki for its page on anything, and if no such page exists, the wiki just makes you a new one. Fluidinfo does the same thing with its objects and about tag. If you want to think about Fluidinfo in all its generality, you should now consider that the about tag above could have been for any tweet, including tweets that don’t exist (or don’t yet exist), for any URL, in fact for any string. We also could have followed Nova’s suggestion and used an about value like "twitter.com/id=4999653280". But we’re getting ahead of ourselves.

Fluidinfo has a simple query language, so let’s quickly confirm that we can find this object with a search:

fdb.objects.get('fluiddb/about = "http://twitter.com/novaspivack/status/4999653280"')
>>> (200, {u'ids': [u'ab7fa032-06df-45be-9bb2-859c18c4d342']})

The 200 is an HTTP OK status telling us the call succeeded, and you can see one object matched the search and that its id is as expected.

So how about some metadata? Let’s say I want to add a rating to the object. Here’s a bit of one-time setup. First I get my top-level namespace (which corresponds to my Fluidinfo user name). Then I create a new tag called rating in that namespace:

from fom.mapping import Namespace

ns = Namespace('terrycojones')
ns.create_tag("rating", "A tag for Terry's ratings.", False)

The False argument is telling Fluidinfo that I don’t want the tag to be indexed. Ignore that for now.

The magic of FOM lets us directly examine the tag using Python attributes. So you can get the tag and see its description like so:

rating = ns.tag('rating')
rating.description
>>> u"A tag for Terry's ratings."

At this point we have a new tag, or an abstract tag if you prefer, but we haven’t actually tagged any objects with it. So let’s tag the object we created above for Nova’s tweet:

o.set('terrycojones/rating', 6)

That was pretty easy! The Fluidinfo object that’s about Nova’s tweet now has some metadata on it, a ‘terrycojones/rating’ tag, with a value of 6. Let’s make sure we can get that value back:

o.get('terrycojones/rating')
>>> (6, None)

We get a 2-tuple whose second value is None when the tag’s value is a primitive Python type (in this case an integer).

Let’s do a couple of quick searches for objects with terrycojones/rating tags:

fdb.objects.get('terrycojones/rating = 6')
>>> (200, {u'ids': [u'ab7fa032-06df-45be-9bb2-859c18c4d342']})

fdb.objects.get('terrycojones/rating > 4')
>>> (200, {u'ids': [u'ab7fa032-06df-45be-9bb2-859c18c4d342']})

fdb.objects.get('has terrycojones/rating')
>>> (200, {u'ids': [u'ab7fa032-06df-45be-9bb2-859c18c4d342']})

In each case just that one object is returned, as expected. Note that the last query just tests for the presence of the tag, irrespective of the tag’s value (if any).

So there you have it: arbitrary metadata on tweets, and with a query language to help find things.

But let’s press on and see how things get more interesting.

First of all, you may have noticed that I didn’t have to deal with permissions at all in the above. I was able to create the Fluidinfo object about Nova’s tweet and to tag it without asking permission. In Fluidinfo that’s always the case.

But there is a permissions system. Let’s log in as a different user and try a few things to see how it works. First, I’ll log in as njr another user whose password I happen to know:

fdb.db.client.login('njr', 'PASSWORD')

The njr user is actually Nicholas Radcliffe who has written several great introductory articles about Fluidinfo over at About Tag.

Let’s try (as Nick) getting the terrycojones/rating tag from the object for Nova’s tweet:

o.get('terrycojones/rating')
>>> (6, None)

That still works, so we can infer that the terrycojones/rating tag is readable by the njr user. Let’s log in as terrycojones again and have a look at the permissions:

fdb.db.client.login('terrycojones', 'PASSWORD')
fdb.permissions.tag_values['terrycojones/rating'].get('read')
>>> (200, {u'exceptions': [], u'policy': u'open'}

We’ve asked Fluidinfo for read permissions on tag values for the tag terrycojones/rating. The result is a general policy (open), with exceptions (currently empty). Now I’ll put the njr user into the exceptions list, and confirm the result:

fdb.permissions.tag_values['terrycojones/rating'].put('read', 'open', ['njr'])
>>> (204, None)
fdb.permissions.tag_values['terrycojones/rating'].get('read')
>>> (200, {u'exceptions': [u'njr'], u'policy': u'open'}

The 204 status above is just the HTTP way of telling us that the call succeeded and that the reply has no content (as expected).

Now let’s reconnect as njr and try getting the terrycojones/rating tag again:

fdb.db.client.login('njr', 'PASSWORD')
o.get('terrycojones/rating')
>>>

You can see we got nothing back. If FOM handled non-OK HTTP responses a little more carefully, you’d see that this request actually got a 401 (Permission Denied) status. Fluidinfo is now refusing to let njr read the tag.

Nick already has a rating tag, called njr/rating, so let’s go get it, make sure there’s not one already on our object, and then tag our object with it:

ns = Namespace('njr')
rating = ns.tag('rating')
o.get('njr/rating')
o.set('njr/rating', 4)
o.get('njr/rating')
>>> (4, None)

Now things are getting interesting. We have tags from different users on the same object. That’s part of the point of Fluidinfo and its where the value comes from: putting information together allows you to do nice things, like query on it. After re-connecting as terrycojones, I can now do queries like this:

fdb.objects.get('terrycojones/rating > 5 and njr/rating > 3')
>>> (200, {u'ids': [u'ab7fa032-06df-45be-9bb2-859c18c4d342']})

fdb.objects.get('terrycojones/rating > 5 and njr/rating < 3')
>>> (200, {u'ids': []})

fdb.objects.get('has terrycojones/rating and njr/rating >= 4')
>>> (200, {u'ids': [u'ab7fa032-06df-45be-9bb2-859c18c4d342']})

fdb.objects.get('has terrycojones/rating and has njr/rating')
>>> (200, {u'ids': [u'ab7fa032-06df-45be-9bb2-859c18c4d342']})

fdb.objects.get('has terrycojones/rating except has njr/rating')
>> (200, {u'ids': []})

There’s a lot more I could do too, like giving Nick permission to add terrycojones/rating tag to objects. By the way, Nick has written some nice articles about the Fluidinfo permissions model. See Permissions Worth Getting Excited About and The Permissions Sketch.

For a final look at metadata, let’s put something totally different onto our object:

ns = Namespace('terrycojones')
page = ns.create_tag("page", "Terry's page tag.", False)
o.set('terrycojones/page', 'hiHello there!', 'text/html')

I’ve just made a new tag called terrycojones/page and tagged our object with it. What’s different here is that the value is a string, and I’m passing a MIME type with it. If I retrieve the value of the tag on the object, you’ll see the MIME type comes back too:

o.get('terrycojones/page')
>>> ('hiHello there!',
 'text/html')

and as you might hope, if you go get that tag from that object using a browser, the MIME type is returned in the HTTP Content-type header, so you end up with a real web page, with a predictable URL. Try clicking: http://fluiddb.fluidinfo.com/objects/ab7fa032-06df-45be-9bb2-859c18c4d342/terrycojones/page. We can do the same for any MIME type at all – including ones you invent for your own convenience.

So there you go. That’s metadata on tweets. With a permissions model, with a query language, with user identity, with the freedom to add anything you want, and with typed data. We don’t need a new programming language for doing this sort of thing. What we need is a better data architecture.

Fluidinfo was designed with exactly this kind of use in mind. And it’s not specific to Twitter or tweets, or anything in fact. So you can put metadata onto anything you like, search on it, continue to own/control your own data, combine it as you like, and get data in and out using a simple HTTP API.

This is all live. It’s up and running, you can do this today. I should also add that Fluidinfo is still an early alpha, and is not yet particularly fast. For more information on Fluidinfo, start with the high-level description and if you’re a programmer, read the API docs.

Next time I’ll show you how we’re putting metadata onto Twitter users, and how you can too, of course! I might also start to talk about Tickery, our upcoming Twitter query application.

If you like all this, please pass on this article. We’d love to get the word out about Fluidinfo. It’s a little difficult from Barcelona.

September 17, 2009

FluidDB Weekend of Code

Filed under: Events,Programming — Terry Jones @ 4:21 pm

Image: gui.tavares

Image: gui.tavares

Based extremely loosely on Google’s Summer of Code program, we’re pleased to announce the FluidDB Weekend of Code offer. Here’s the deal.

You have a go at writing a client-side library for the FluidDB HTTP API in a programming language for which no library currently exists (here’s the current list). We send you a new copy of the book of your choice for that language, plus a large pizza to keep you going. You release your code as open source, and we link to it & put your name up in lights on the libraries page.

So if you’d like to play around with a new programming language and want a fun project to tackle, why not have a go? There’s no formal commitment, and no strings attached. We’ll send you a book to help, and you get to keep it no matter what.

For example, there’s no Scala library yet. We’d love to have one, and would be delighted to send you a copy of the new Programming Scala book from O’Reilly. Or a copy of Erlang Programming, or maybe Real World Haskell takes your fancy. Or, write a library in Javascript, or C, etc. If there’s a book that can help you (even to learn some entirely other language), we’ll ship it. We’ll also be happy to help you if you join us in the #fluiddb channel on irc.freenode.net or sign up for the FluidDB-users mailing list.

Sound like fun? Send mail to info at fluidinfo com. We’ll probably just send one book per language, so please understand if we’ve already got someone working on your first choice. And if you already wrote a library, well thanks 🙂 (seriously, feel free to ask for a book too; it’d be a pleasure).

« Newer Posts

Powered by WordPress