This post is all about querying the O’Reilly book and author information recently imported into Fluidinfo. If you want the skinny on Fluidinfo’s query language in glorious in-depth techno-geek-speak then check out the documentation. If you’d rather see some real world examples, read on…
In Fluidinfo, objects represent things (and all objects have a unique id). Information is added to objects using tags. Tags can have values, and tag names are organized into namespaces that give them context. Permissions control who can see and use namespaces and tags.
Objects do not belong to anyone and don’t have permissions associated with them. They’re openly writable. Anyone can tag anything to any object. Many objects have a special globally unique “about” tag value that indicates what they are about. Interaction with Fluidinfo is via a REST API.
That’s Fluidinfo in a nutshell.
In another article published today I describe the Fluidinfo tags and namespaces used to annotate objects with O’Reilly data. The tags are attached to objects for O’Reilly books and authors. Both kinds of objects have about tags. So a trivial first kind of query is to go directly to an object that’s about a book. For example, to get information about the object representing the book “Open Government” visit the URL http://fluiddb.fluidinfo.com/about/book:open government (daniel lathrop; laurel ruma).
You’ll get back a JSON response containing a list of all the tags (that you have permission to read) attached to that object and the object’s globally unique id. Similarly, you can go directly to the object for an O’Reilly author http://fluiddb.fluidinfo.com/about/author:tim oreilly.
In case you’re wondering about the format of these book and author about tags, we used the abouttag library written by Nicholas Radcliffe to generate them. They’re designed to be readable, easy to generate programmatically, and unlikely to result in collisions. You don’t have to remember them though, as there are many other ways to get at objects, via querying, as we’re about to see.
Queries on tags and their values
Below are some examples of using Fluidinfo’s query language.
Presence
Return all the objects that have an O’Reilly title:
has oreilly.com/title
You can see the results at the following URL: http://fluiddb.fluidinfo.com/objects?query=has oreilly.com/title. Once again, the result is in JSON. It simply contains a list of the ids of matching objects (representing things that O’Reilly have tagged with a title).
That’s the equivalent of the following SQL statement:
SELECT id FROM oreilly.com WHERE title IS NOT NULL;
Caveat: There are no tables in Fluidinfo so it’s impossible to make a direct translation to SQL. This example and those that follow simply illustrate a conceptual equivalence to make it easier for those of you familiar with SQL to get your heads around the Fluidinfo query language.
Comparison
Return all the O’Reilly objects whose price is less than $40 (the price is stored in cents).
oreilly.com/price-us < 4000
Here it is as a URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/price-us < 4000
In SQL it would be:
SELECT id FROM oreilly.com WHERE price-us < 4000;
Text Matching
Return all the O'Reilly objects that have "Python" in the title.
oreilly.com/title matches "Python"
The resulting URL: https://fluiddb.fluidinfo.com/objects?query=oreilly.com/title matches "Python"
In SQL:
SELECT id FROM oreilly.com WHERE title LIKE '%Python%';
Set Contents
Return all the O'Reilly objects representing authors who were involved in writing the work with ISBN "9781565923607" (which is the unique ID O'Reilly use in their catalog). The value of oreilly.com/authors/works tags is always a set of unique ISBN numbers like this: ["9781565923607", "9781565563728", "9781627397284"].
oreilly.com/authors/works contains "9781565923607"
The resulting URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/authors/works contains "9781565923607"
In SQL:
SELECT id FROM oreilly.com/authors WHERE '9781565923607' in (SELECT works FROM oreilly.com/authors);
(Actually, the similar "IN" operation in SQL isn't a very good example since it results in verbose monstrosities like the above.)
Exclusion
Return all the O'Reilly books that were published in 2001 except those published in April.
oreilly.com/publication-year=2010 except oreilly.com/publication-month=4
The resulting URL: https://fluiddb.fluidinfo.com/objects?query=oreilly.com/publication-year=2010 except oreilly.com/publication-month=4
In SQL:
SELECT id FROM oreilly.com WHERE year=2010 and month<>4;
Logic
It's possible to use the and and or logical operations. For example, return all the O'Reilly books whose title matches "Python" and were published before 2005:
oreilly.com/title matches "Python" and oreilly.com/publication-year < 2005
The resulting URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/title matches "Python" and oreilly.com/publication-year < 2005
In SQL:
SELECT id FROM oreilly.com WHERE title LIKE '%Python%' and year < 2005
Grouping
Return all the objects representing O'Reilly books mentioning "Python" in their title that were published in either 2008 or 2010.
oreilly.com/title matches "Python" and (oreilly.com/publication-year=2008 or oreilly.com/publication-year=2010)
The resulting URL: http://fluiddb.fluidinfo.com/objects?query=oreilly.com/title matches "Python" and (oreilly.com/publication-year=2008 or oreilly.com/publication-year=2010)
In SQL:
SELECT id FROM oreilly.com WHERE title LIKE '%Python%' AND (year = 2008 OR year = 2010);
Querying across different data sets
Fluidinfo can query seamlessly across tags from different sources that are stored on the same object. E.g., return the titles of all O'Reilly books that Terry Jones owns.
has oreilly.com/title and has terrycojones/owns
The resulting URL: http://fluiddb.fluidinfo.com/objects?query=has oreilly.com/title and has terrycojones/owns
In SQL:
Well, it's actually not clear how you'd do this in SQL. Presumably there'd need to be some kind of table join, supposing that were possible!
Getting back tags on objects matching a query
It's also possible to indicate which tag values to return for each matching object. This is done by using the Fluidinfo /values HTTP endpoint and specifying the tag values to return as arguments in the URL path. For example, if I wanted the title, author names and publication year of all the O'Reilly books with the word "Python" in the title published before 2006 then I'd use the following query:
oreilly.com/title matches "Python" and oreilly.com/publication-year < 2006
and append the wanted tags to the URL after the query (in any order):
&tag=oreilly.com/title&tag=oreilly.com/author-names&tag=oreilly.com/publication-year
This is similar to the following SQL:
SELECT title, authors, year FROM oreilly.com WHERE title LIKE '%Python%' AND year < 2006;
Fluidinfo returns a JSON object like this:
{u'results': {u'id': {u'1a91e021-7bce-4693-bfa5-0dc437fe1817':
{u'oreilly.com/author-names': {u'value': [u'Anna Ravenscroft', u'David Ascher', u'Alex Martelli']},
u'oreilly.com/publication-year': {u'value': 2005},
u'oreilly.com/title': {u'value': u'Python Cookbook, Second Edition'}},
u'1d25baae-b977-4ff4-bb77-01c52bd1d339':
{u'oreilly.com/author-names': {u'value': [u'Fredrik Lundh']},
u'oreilly.com/publication-year': {u'value': 2001},
u'oreilly.com/title': {u'value': u'Python Standard Library'}},
u'3360f05f-9bf4-4da5-abc0-0e3742809b98':
{u'oreilly.com/author-names': {u'value': [u'Fred L. Drake Jr', u'Christopher A. Jones']},
u'oreilly.com/publication-year': {u'value': 2001},
u'oreilly.com/title': {u'value': u'Python & XML'}},
u'9845b184-ef1b-46fb-8e7c-011da053dcb6':
{u'oreilly.com/author-names': {u'value': [u'Andy Robinson', u'Mark Hammond']},
u'oreilly.com/publication-year': {u'value': 2000},
u'oreilly.com/title': {u'value': u'Python Programming On Win32'}}}}}
It's also possible to update and delete tag values from matching objects. This process is explained in detail in the Fluidinfo documentation and this blog post.
Finally, rather than interacting with Fluidinfo directly using the raw HTTP API it's a good idea to use one of the client libraries listed here. For example, using the fluidinfo.py library the last example query can be executed as follows:
>>> import fluidinfo
>>> import pprint
>>> headers, result = fluidinfo.call('GET', '/values', tags=['oreilly.com/title', 'oreilly.com/author-names', 'oreilly.com/publication-year'], query='oreilly.com/title matches "Python" and oreilly.com/publication-year < 2006')
>>> pprint.pprint(headers)
{'cache-control': 'no-cache',
'connection': 'keep-alive',
'content-length': '937',
'content-location': 'https://fluiddb.fluidinfo.com/values?query=oreilly.com%2Ftitle+matches+%22Python%22+and+oreilly.com%2Fpublication-year+%3C+2006&tag=oreilly.com%2Ftitle&tag=oreilly.com%2Fauthor-names&tag=oreilly.com%2Fpublication-year',
'content-type': 'application/json',
'date': 'Thu, 10 Mar 2011 15:17:58 GMT',
'server': 'nginx/0.7.65',
'status': '200'}
>>> pprint.pprint(result)
{u'results': {u'id': {u'1a91e021-7bce-4693-bfa5-0dc437fe1817': {u'oreilly.com/author-names': {u'value': [u'Anna Ravenscroft',
... etc ...
Learn more
Hopefully, this has explained enough to get you started. If you don't have a Fluidinfo account, you can sign up here. If you have any questions, please don't hesitate to get involved with the Fluidinfo community, contact us directly or join us on IRC. We'll be more than happy to help!
is it possible to use aggregate functions, such as count() ? – just trying to find out the number of entries in a subset…
Comment by Yosun Chang — April 14, 2011 @ 12:09 am