Fluidinfo

November 12, 2009

Why are Post-it notes sticky?

Filed under: Essence — Terry Jones @ 1:26 am
Image: PabloBM

Image: PabloBM

[Update: it has been politely pointed out (in the comments following this post) by David Semeria that the Post-it note analogy to FluidDB came from him, not me!]

It’s a pretty simple question, but you may have never thought about it directly. It’s the stickiness of Post-it notes that makes them so extraordinarily useful. The stickiness allows us to put a note in the place that makes the most sense, in the place where its information will be in context, and where it will have the greatest utility.

That’s trivial, agreed, but I nevertheless find it interesting and instructive.


Image: someToast

Image: someToast

Using Post-it notes we can add information to things in a very wide range of ways. The information might be about the thing to which it’s attached, or the object might just be something we anticipate being encountered at a relevant future moment.

How often do people ask for permission to attach a Post-it note to an object? Probably not very often. I’ve certainly never done it.

Image: someToast

Image: someToast

The information on the Post-it note can’t be presented by the object the note is on. If the object had been designed to carry that information, there’d be no need for a Post-it note. So Post-it notes are almost by definition for adding information to things in unanticipated ways.

And as for the content of Post-it notes – that’s clearly highly unpredictable. I’ve illustrated this posting with some fun examples.

Image: _nickd

Image: _nickd

It’s very useful to be able to put information in its most natural or useful place. We do it all the time. The other day I returned to my apartment and taped to the inside wall of the elevator was a form for the neighbors of my building to enter their gas meter readings. The elevator was the perfect place for the notice. But it certainly wasn’t designed for that purpose. The representative of the gas company didn’t ask permission, they just taped up the form.

Image: Iain Farrell

Image: Iain Farrell

I love thinking about how we work with information in the real world – especially the kinds of things we do so frequently or naturally that we barely notice them.

Image: wrestlingentropy

Image: wrestlingentropy

All of which brings me, inevitably, to FluidDB. As I’ve mentioned before several times, FluidDB objects have no owner. That means that anyone can put the digital equivalent of a Post-it onto anything they like, for whatever purpose, without asking for permission, and without anyone having to anticipate that they would want to do so.

Image: mulmatsherm

Image: mulmatsherm

That’s all I’ll say for now, as I’m trying to keep this short. I hope it will be thought provoking. Think about how tightly controlled, how unspontaneous, and how awkward our typical computational experiences are. There’s a reason for that, and it’s rooted in information architecture. Think about how Post-it notes, in a simplistic but important way, make the world writable. And then, for extra points, think about FluidDB 🙂


I can’t resist one final image.

Image: Mr.Thomas

Image: Mr.Thomas

October 4, 2009

Digital hobgoblins

Filed under: Essence — Terry Jones @ 4:02 am

hobgoblin-hallMuch of the thinking behind FluidDB comes from thinking how we work with information in the real world, and comparing that to how we do so in the computational world (aka Hobgoblin Hall). The differences are striking.

Over the years, I’ve often asked myself why things are so bizarre in the computational world and why we don’t do something about it. Without going into the answers to those questions, I’ll just say that I think we’ve all grown up in Hobgoblin Hall and despite the fact that we’re all perfectly familiar with the freedoms of the outside world, we take it for granted that things are deeply weird in our computational homes.

Here I’ll quickly outline a few of the more glaring oddities. (BTW, I’d be remiss not to point out that FluidDB has none of the following restrictions.)

Things must be named, and have one name. In the real world we have plenty of things that don’t have names. As I look around my desk right now, I can see dozens of things that don’t have names. We also often give things many names – first names, surnames, nicknames, abbreviated names, English/Spanish/Chinese/etc names. Flexibility in naming (no names, one name, multiple names, private names, etc) is obviously of great utility. Yet in computational systems we’re usually compelled to name things, and we’re restricted to a single name. These are just a couple of the problems I have with file systems – I have about 10 others, but will spare you.

Inconsistency and ambiguity are common in the real world. While they’re obviously often not helpful, there are times when it is very useful to have both. Things may become clearer over time. Systems evolve. In the natural world we use representations that allow high degrees of inconsistency and ambiguity, and it’s very useful to be able to do so – how else would we learn or get anything accomplished if not? Yet if you suggest a computational system that explicitly allows for any level of inconsistency and ambiguity in information, people start to get nervous or even upset. They’ll begin to argue with you, and suggest ways to “fix” the system to get rid of the undesirable qualities. Why is that?

Multiple organizations of the same information are very common in the natural world. We do it all the time. Computationally it’s rare that systems allow us to multiply organize things. That’s changing, thankfully, with the rise of tagging and with music collection software that allows multiple simultaneous playlists (or “smart” dynamic playlists) of the same underlying sound files. But those systems are the exception rather than the rule.

There’s an obsession with “meaning” and pinning down what things are “about” in the computational world. In the real world we don’t seem to care that much – we’re more concerned with utility. What’s a book for? Something to read? Something to stop other objects from blowing away? Something to be hollowed out to hold a gun? Something to create an intellectual impression? A decoration? Something to hold up other books? A hiding place? A book can be all these things, and we can move seamlessly between them. What’s a glass for? Is it a weapon? Something we can hold to the wall to hear a conversation? Something to use as an insect trap? A fingerprint capturing device? A musical instrument? Maybe even something to drink out of? We don’t really know, and it’s not important to know. We don’t obsess over the “meaning” of a glass or try to determine what single thing it might be “about” etc. We just use it as we see fit. Similarly we can’t anticipate how people will want to use information – and our storage architecture shouldn’t try. (You could counter by pointing out the FluidDB about tag. But usage of the about tag is entirely optional. It’s a convenience. You can make your own, or use none at all. And a FluidDB object can be about whatever you think it’s about and used for whatever you want to use it for – even if others have completely different interpretations of and uses for the same object. No problem.)

Later (meta?) data is often most usefully put with the original data. That’s what we commonly do in the real world. It’s convenient, easy, useful, and natural. For example, when you’re reading a book and you want to remember what page you’re up to, you can simpy dog-ear the page or insert a bookmark. The extra information travels with the book. In the computational world you can’t do things like that unless a programmer has anticipated that you might want to and made provision for the extra information in the underlying data structure (or database). So we’re very often forced to put extra unanticipated information elsewhere – e.g., in a file, in our heads. Unfortunately, that later information is often the most important – because it’s generated by individuals who are trying to customize or personalize their computational world. I’ll have much more to say about that another time. For now: a writable architecture like FluidDB does not have this limitation because you can always put the new (meta?) information with the old (content?). And search on it, etc. That offers a fundamental change in how we work with information. I’ll blog about it at length one of these days. You get to think about it in the meantime 🙂

October 3, 2009

Fluidinfo as a universal metadata engine

Filed under: Essence — Terry Jones @ 3:56 am

Image: Jin Wicked

Image: Jin Wicked

One way to use Fluidinfo, among many, is as a universal engine for metadata.

I’ll have to explain what I mean by that, especially seeing as some people got the impression from the earlier post on data vs metadata that we don’t think metadata is important, or that it doesn’t exist, or similar. I tried to make it clear in the post, and in responding to the comments that followed, that that’s not what was meant: In fact that’s one of the major initial goals of Fluidinfo – to be a metadata engine for everything. So that’s how important we think metadata is! The way to support metadata on anything is to have an underlying architecture that’s flexible enough to allow that to happen – without someone setting the thing up with an a priori determination of what’s meta- and what’s not. True support for metadata is too important for that – to do it properly you need the architecture to be neutral.

The question is: how can Fluidinfo be used as a universal metadata engine?

Metadata can be loosely defined as data that’s about other data. So to provide a universal metadata engine, any time any application wants to store some metadata (M) about anything (A), Fluidinfo should always have a place to put M. Moreover, the application shouldn’t have to stop to ask if it’s ok to store the metadata, and its needs shouldn’t have to be anticipated.

The key word in the above paragraph is about. Fluidinfo has an about tag that can (optionally) be added to objects to indicate what they’re about. There’s a lot that could be said about the about tag – in fact, the person who pushed for its inclusion in Fluidinfo, Nicholas Radcliffe even started a blog of that name. The main thing to know for now is that the about tag on an object (if any), is immutable and its value (always a string) is unique across all Fluidinfo objects.

To give some simple examples, there might be objects in Fluidinfo with about tags that have values such as isbn:140679239X or http://www.abebooks.com/servlet/BookDetailsPL?bi=588210745 or US:ZIP:90210 or info@fluidinfo.com or IP:207.171.166.252 or….. anything you like. That’s the point.

So when an application – any application – wants to store information about A, it just asks Fluidinfo for the object whose about tag has the value A. If the object already exists, Fluidinfo returns it. If not, Fluidinfo creates a new object, sets its about value to A, and returns it.

That’s the first part of being a universal metadata engine: if you want to store information about something, Fluidinfo gives you an obvious place to put that information (provided you can convert your particular A into a string of some kind). In this regard, Fluidinfo is like a wiki. When you use a wiki, you can ask it for the page on any subject, and if it doesn’t exist it will be created. As with a wiki, you can think of Fluidinfo as already having objects about everything; just like a wiki, Fluidinfo doesn’t actually create any particular object until someone asks for it.

The second crucial component is Fluidinfo’s model of control. As mentioned in the Information. Naturally. post, Fluidinfo objects do not have owners. That means that all applications are guaranteed that they can store information onto the Fluidinfo object about A.

Putting these two together, you get something that starts to look very much like a universal metadata engine. Got some metadata to store about something? Fluidinfo gives you an obvious place to put it and a guarantee, in advance, that you’ll be able to do so. This is what we mean when we say Fluidinfo makes the world more writable.

To give a couple of quick examples, Emanuel Carnevale has written two Javascript programs for the Firefox 3.5 Jetpack extension. These are just quick proofs of concept for now, but they will mature. One is fluidy-hood that offers functionality along the lines of Google’s Sidewiki (though more general), and the second is BRB, the Borthwick Remember Button, in honor of John Borthwick of Betaworks who asked for one. These are very simple pieces of code that use Fluidinfo as a universal metadata engine, in both cases putting information onto the object that’s about the URL you’re currently looking at in Firefox.

A final comment about the creation of value: These tiny apps have limited and unremarkable value to their individual users. Things get much more interesting though we you consider that these applications are creating truly social data. It is directly searchable via the Fluidinfo query language. It can be combined with other information—homogenous (created by the same app being run by someone else) and heterogenous (related but different information about the same thing created by other apps). It can be accessed, augmented, and mashed up by others. And the person who created the information can continue to control it: share it, protect it, edit it, delete it, etc.

When you look at data and applications like this, you begin to see why we’re so excited about the kinds of changes in how we work with information that we think Fluidinfo can help to introduce.

There’s a lot more that can be said regarding the about tag, about how all this affects customization, personalization, and information organization in general, about ambiguity and its resolution, and about the creation of value via putting information into context. Those things will have to wait for later blog postings, though.

Stay tuned.

September 10, 2009

The myriad benefits of a simple query language

Filed under: Essence — Terry Jones @ 12:50 am

Fluidinfo has a simple query language. If you are familiar with any other query language, you can probably learn the entire Fluidinfo language in a couple of minutes. The image below shows a summary of the whole language. Without going into details, you can immediately tell there’s not much to it. Click on the image to read more. In contrast, SQL is massive. The SQL 2008 standard comes in 9 parts, the second of which is over 1300 pages.

Fluidinfo query language summary

The downside to having such a simple query language is that complicated data retrieval, processing and organization is not done server-side. Applications have to request data in a simpler fashion, process it locally, and make further network requests if they need additional related data.

The strong upside is that a deliberately simple query language permits architectural simplicity. Because query processing is the most complex part of Fluidinfo, it bounds underlying complexity and has a direct influence on overall system implementation and architecture. Whereas a complex query language, such as SQL, makes it difficult to scale, a simple one makes scaling simpler—at least in theory; you still have to build it, of course!

The trick is getting the balance right: design a query language that’s practical and useful for a wide variety of common tasks, but whose simplicity confers important architectural advantages.

Here are a few ways in which the Fluidinfo query language and the resultant architecture give us hope that we’re building something that can grow.

  • Complex queries are not possible. You can make a big query in Fluidinfo or a deep query or a query that returns many results, but you can’t make a complex query—I mean the kind of query that can bring an SQL server to its knees. Just for starters, the Fluidinfo query language has no JOIN statement. When a query language is complex, the database is at the mercy of its applications: Applications can submit queries with JOINs that are so complex that the required data cannot reasonably be brought together (JOINed) in order for the selection to proceed.
  • All query resolution is simple. In the parse tree of any Fluidinfo query, all the leaves are simple. Each requires either a single lookup in a B-tree (or similar), or a single text match. The result of the processing at a leaf is always a set of object ids. The internal nodes of the query tree only require set operations (union, intersection, difference) on object ids. Below is a fragment of a query parse tree. There’s nothing else.

    A Fluidinfo query parse tree fragment

  • Parallelization is trivial. Because the values of Fluidinfo tags are stored separately, as in a column store, leaf queries are always sent in parallel to the independent servers that maintain the tags in question.
  • It scales horizontally. Because tag values are stored independently and internal query tree nodes are always simple set operations on object ids, the architecture is easy to scale horizontally. We built (and open-sourced) txAMQP to combine Thrift and AMQP with Twisted to give ourselves transparent messaging-mediated RPC. That means the new servers can be deployed and run services that simply join or create the appropriate AMQP queues, and immediately begin receiving RPC calls. When more tag servers or set operation servers are needed, it is trivial to add them.
  • Unused tags can be taken offline. Because tags are stored independently, those that have not been used for some time can have their values serialized and stored in a cheaper medium for the interim. They need not occupy expensive and scarce RAM. When they’re next queried—if ever—they can rapidly be brought back online. This is an architectural advantage that’s mainly made possible by the system design, not the query language simplicity. I’ve included it nevertheless, because this kind of optimization might not be possible in a system with a query language that demanded a more complex underlying data organization.
  • It can scale down as well as up. Just as scaling up by adding servers is simple, servers can be taken down during quieter periods. Set operations servers can simply disappear. Tag servers can migrate management of their tags to other servers or just take tags offline – they will be re-animated by another tag server when next needed.
  • Adaptive affinity is straightforward. When tags are frequently being queried together, they can be migrated to the same tag server. Then an entire sub-query involving both can be sent to that server and the result, just a set of object ids, flows up through the query tree exactly as it would have had the leaves been processed on separate servers. And when things get too hot, i.e., tags being stored together have created a hotspot, they can be migrated to separate servers.

That’s enough for now. There are other, more detailed, advantages that I’ve omitted for brevity. I’m trying to keep each of these posts down to reasonable size.

September 5, 2009

Metadata vs Data: a wholly artificial distinction

Filed under: Essence — Terry Jones @ 9:15 pm

Image: psd

Image: psd

Computer scientists are fond of talking about metadata. There often seems to be an assumption that drawing a distinction between metadata and data is useful and perhaps even necessary.

At an architectural level, I think that’s entirely wrong. Any storage architecture that maintains a distinction between metadata and data has real problems that will limit its flexibility and usefulness. Note that I’m not saying that an application shouldn’t maintain a distinction between metadata and data, or that applications shouldn’t present things to users in those terms, or that it’s not useful to think in terms of metadata and data. I’m also not claiming that every storage architecture needs to be flexible – there are obviously times where that appears unnecessary (though in many cases you may end up wanting more flexibility).

I’ll simply argue that if you aim to build a storage architecture with real flexibility, maintaining a distinction between data and metadata runs directly counter to your goal. Below I’ll outline some reasons why.

But first, consider the natural world. If you talk to a regular person — meaning someone who’s not a computer scientist, a librarian, an archivist etc. — and ask them if they know what metadata is, you’ll probably draw a blank. Why is that? It’s because the distinction between data and metadata is entirely artificial. It does not exist in the real world, and it’s clear that regular people can get by just fine without it. Fluidinfo draws its inspiration from the way we work with information in the natural world, and maintains no such distinction.

It’s interesting to speculate on the origins of the metadata vs data distinction. I’d love to know its full history. I suspect that it arose from early architectural constraints, from the relative design and programming ease of maintaining a set of constant-size chunks of information about files apart from the dynamic and variable-size memory required by the contents of files. I suspect it probably also has to do with architectural limitations and the slowness of early machines.

Here then are the main reasons why the distinction is harmful.

  • Two access methods: When metadata and data are stored separately, the way to get at those two different things is likely to be different. Consider inodes in a UNIX filesystem versus the disk blocks containing file data. They are stored differently and cannot be accessed in a uniform way. This causes internal complexity for the storage architecture.
  • Two permissions systems: There are likely to be two permissions systems governing changes to metadata and data. This is another source of internal complexity for the architecture.
  • Search across the two is complex or impossible: Why has it traditionally been so hard to find, for example, a file with “accounts” in its name and “automobiles” in the contents? Because this is a simultaneous search across file metadata and file content. The division between metadata (the name) and the data (the content) made such searches extremely difficult. Even with modern systems it’s awkward. Consider the UNIX find command which searches based on file metadata and the grep command which searches file contents. Combining the two is not easy. It’s at least possible in some systems these days, but that’s because those systems pull all the information together and build a separate index on it – i.e., they allow it by removing the division between metadata and data.
  • A central piece of content: Systems, especially document or file systems, usually maintain a distinction between the content and the metadata about the content. But the real world doesn’t work that way. You may possess information about something without having the thing. There may be no pieces of content, or there may be many.
  • Who decides?: If a system maintains a distinction between metadata and data, who decides which is which? Almost inevitably, it’s a programmer, a system architect, or a product manager who makes those decisions. There’s an implicit assertion that they know more about your information than you do. They decide what should be in the metadata. While there are systems that let users create metadata, they are usually limited in scope – someone has decided in advance how much metadata a regular user should be allowed to create, what kind of metadata it can be, how it will be used, how users will be allowed to search on it, etc. The intentions are good, but the whole thing smacks of parental control, of hand-holding, of “trust us, we know better than you do”.
  • Time dependency at creation: Systems maintaining the distinction also introduce an unnatural time dependency. Until the content (i.e., the data) is available, there’s nowhere to put the metadata. E.g., a file object has to be created before it can have metadata, a web page has to come into existence before you can tag it. But the real world doesn’t work that way. E.g., you can have an opinion about someone you’ve never met, or someone who’s dead or fictional. You can have a summary of a call agenda before the call happens, or notes about a meeting before the minutes of the meeting are prepared.
  • Time dependency at deletion: The awkward time dependency bites when the content is deleted too. The metadata necessarily vanishes because the architecture doesn’t allow it to persist: there’s literally nowhere to put it. Once again, the real world doesn’t work that way. E.g., you’re sent a large image file of someone’s pet cat – you take a look and, to show you care, make a mental note of its name and breed, but you delete the image because you don’t want to store it. Or suppose you give away or lose your copy of Moby Dick – you don’t therefore immediately forget the book’s title, its plot, the author, the name of the main character, an idea of how long it is, the book’s first line, etc. The “content” is gone, but the metadata remains. You may have never owned the book, you may think you have a copy but do not, you may have two copies – in the natural world it just doesn’t matter, and nor should it in a storage architecture. Interestingly, Amazon are currently being sued because they threw away someone’s metadata in the process of removing a copy of Orwell’s 1984 from a Kindle. You can bet the metadata was removed automatically when the content was removed.

OK, enough examples for now.

Fluidinfo has none of the problems listed above. It has absolutely no distinction between metadata and data. It has a single permissions system that mediates access to all information. When a tag (perhaps used or presented as the “content” by an application) is removed from an object, all the other tags remain. There is no distinction between important system information and the information stored by any regular user or application – they’re all on an equal footing, and that includes future applications and users. No-one gets to set the rules about what’s more important and what’s not, there’s simply no distinction. You can search on anything, using a single query language – the system uses the query language to find things it needs, just like any other application. The single permission system mediates who can do what – equally and uniformly.

I used to argue that everything should just be considered data. But I think David Weinberger puts it better in Everything is Miscellaneous where he says it’s all metadata. Call it what you will, it’s clear (to me at least) that at a fundamental level there should be no distinction.

BTW, if you’re into self-reference, you might also interested to know that Fluidinfo uses itself to implement its permissions system. Permissions are just more information, after all. Fluidinfo stores that information for tags, namespaces, and users onto the regular Fluidinfo objects that are about those things. There truly is no metadata / data distinction. It’s a little like Lisp: once you have the core system in place, you can (and should) use it to implement the wider system.

August 28, 2009

Information. Naturally.

Filed under: Essence — Tags: , , , — Terry Jones @ 3:36 am

Image: Mary Hodder

Image: Mary Hodder

From the Fluidinfo home page:

Humans are diverse and unpredictable. We create, share, and organize information in an infinity of ways. We’ve even built machines to process it. Yet for all their capacity and speed, using computers to work with information is often awkward and frustrating. We are allowed very little of the spontaneity that characterizes normal human behavior. Our needs must be anticipated in advance by programmers. Far too often we can look, but not touch.

Why isn’t it easier to work with information using a computer?

At Fluidinfo we believe the answer lies in information architecture. A rigid underlying platform inhibits or prevents spontaneity. A new information architecture could be the basis for a new class of applications. It could provide freedom and flexibility to all applications, and these advantages could be passed on to users.

Fluidinfo does not attempt to directly model information accumulation and use in the real world. It simply provides an information architecture that is more flexible than the ones we’re used to. It provides a fairly simple answer to the question of how we might work with information more naturally when using a computer. It does not claim to be the final word on the subject, but points out a fruitful direction for advance. And it provides a concrete implementation that can be used today.

The fruitful direction

The computational world is too read-only, and too tightly controlled. Most of the time we spend using a computer, we are either 1) strictly in read-only mode or 2) using an application that allows us to write, but only in predetermined ways. In contrast, in our normal dealings with the natural world, when using our brains, we are never in read-only mode. We are constantly processing information and adding (i.e., writing) to our mental models. I’m talking about everyday things, like noticing something and remembering that you noticed. Or seeing something you like, and being able to recall that fact later. Even in these trivial acts we are in some sense writing—laying down memories that can later be recalled, sorted amongst, shared, organized, merged, or put aside for long periods or even forever.

In thinking about this extreme fluidity, I find it illustrative to consider how we work with concepts. As I wrote in Kaleidoscope: 10 takes on Fluidinfo:

Concepts are very fluid: they don’t have owners, you don’t ask for permission to add to them, they have no formal structure or central piece of content, they can be organized in many ways, and they have no pre-defined set of qualities or attributes. Exactly the same can be said of Fluidinfo objects.

The fruitful direction—and the mission of Fluidinfo, if I may be so grandiose and dramatic—is to engineer an information architecture with a fluidity similar to that of concepts, in order to make the world more writable. The question (and the point of this posting) is how?

Objects without owners

The answer is actually very simple: Fluidinfo provides support for information objects that do not have top-level owners. These objects are comprised of tags (with values), for which there is a flexible and secure permissions model. Because the objects don’t have owners, anyone, or any application, can add tags to any object it can find. These objects have all the nice properties of concepts mentioned above.

That’s it?

While there’s a lot more to Fluidinfo than having objects with no owners, this single change is the key to the architecture and is responsible for its generality and flexibility. It’s almost embarrassingly simple.

It takes a while for the implications of this twist to sink in. I’m not going to go into the details in this post. I’ll just point out that a simple change in representation can have a surprisingly profound effect. I’ve already written about this, though without giving details of Fluidinfo.

I’m fascinated by representation and its role in problem solving. How can such a simple piece of mental jujitsu result in fundamental change? I’ll describe the consequences in later posts. There are several of them, and I think they’re important. For now though, if you’re interested, please follow the link above to see some simple examples of the power of changing an underlying representation.

August 25, 2009

Kaleidoscope: 10 takes on Fluidinfo

Filed under: Essence — Terry Jones @ 5:42 am

I’ve been asked what Fluidinfo is hundreds of times. I’ve never really known how to answer because it can be looked at from many different angles. As I try to answer, I often feel like I’m holding up a large opaque crystal in front of me, turning it this way and that, until I find an angle that makes sense for this particular listener. I came slowly to the realization that there is no perfect answer, and that Fluidinfo can be many things to many different people. It’s like looking through a kaleidoscope: keep turning it until you see it in a way that’s attractive.

I’ll try to explain some of these points of view in later posts. For now, I’ll just give the flavor of some of them.

The many possible views of Fluidinfo do not mean that it is complex. It’s actually very simple. But it has a flexibility and generality that obviously make it difficult to grasp. Even when you do understand it, it’s common to be able to imagine two or three ways you might use it to solve a particular problem. I’m going to save a description of the object model for later, too.

In no particular order then, here are 10 ways of thinking about what Fluidinfo might be, or become.

1. A database with the heart of a wiki: The wiki analogy is strong in some ways: Anyone can add data (but see below), applications can collaborate, data put in a shared place is more valuable, and abstractly there are Fluidinfo objects for every purpose just like there are wiki pages for every subject – all waiting for someone to create them. But the analogy is very poor in others: Fluidinfo has a strong permissions system that can prevent others from changing or even seeing your data, there is a query language, and content is typed. So Fluidinfo has the flavor of a wiki, but when you get right down to it almost everything is quite unlike a wiki.

There is an interesting related question here. The world of encyclopedias was tightly controlled, and very few people were allowed to write – the encyclopedia was the ultimate read-only authority. Comically, it seemed, Wikipedia was the exact opposite. Yet in the space of a few years, the unthinkable had happened: Wikipedia had eclipsed even the mighty Encyclopedia Britannica. Can Fluidinfo do for applications and traditional databases what Wikipedia did for humans and traditional encyclopedias? You don’t have the rigid tables or schema, anyone can write, content can evolve, and there is no top down control.

2. A metadata store for everyone and everything: Fluidinfo has a special about tag. You can use it to ask for the object that’s about something, like a URL or an ISBN number. That gives applications an easy shared place to put things about other things – i.e., to store metadata. The metadata can be users’ customization or personalization information, ratings, opinions, whatever.

3. A store of concepts: Concepts are very fluid: they don’t have owners, you don’t ask for permission to add to them, they have no formal structure or central piece of content, they can be organized in many ways, and they have no pre-defined set of qualities or attributes. Exactly the same can be said of Fluidinfo objects.

4. A platform for mashups: When a programmer makes a mashup, combining information from different sources to create information about something, where should that new information be stored? The usual answer today is to put the new information in a database, behind a new API, to document it, to get a server, to keep it running. In effect this is just making another hoop for future programmers to jump through to make an additional mashup. In Fluidinfo you can put the new information with the old—because objects don’t have owners. And that’s where it’s probably most valuable, because it’s then immediately available to be mashed up with other data on the same object, and search can target heterogeneous (i.e., mashed up) data on the same object.

5. A way of storing social graphs: Because users each have a Fluidinfo object associated with them, it is very easy to build social graphs. For example, user Andy might put an andy/i-follow tag onto the object for user Betty. If you have a few people doing that, interesting queries are then immediately possible—both within and across social networks.

6. A new way of organizing information: When we organize things, we are creating new information. Normally we store that new information elsewhere. When you can store it on the objects that are being organized, lots of nice things happen. I have an upcoming blog post that I’m tempted to title “Multiple simultaneous non-conflicting dynamic sharable organizations.” A bit of a mouthful, but nevertheless true.

You can also build all data structures from tags in Fluidinfo. They’re slower to use than data structures in a typical program, but where you lose in speed you gain in flexibility.

7. Something that frees us from APIs/UIs: APIs and UIs are usually regarded in a positive way: they make getting to information easy for programs and people. But they also control us. We can only do what they allow and what they anticipate. Tight limits are imposed on us in getting to our own data. Fluidinfo can change this: you can own your own data, you can always add data and customize, and you can directly search on anything you like.

8. A communication system: You can look at Fluidinfo objects as places for cooperating applications to exchange information. The information could be messages, jobs and results, etc. Nicholas Radcliffe, who has understood Fluidinfo for years, today found a new pleasing angle to look at it from, as a Twitter for data.

You could easily use a Fluidinfo object as a voting box, with some nice properties (e.g., retract or change your vote, verify that someone voted without being able to read their vote). And you can do more complex things, too.

9. An evolutionary data system: Fluidinfo allows reputation, trust, and convention to evolve. Its namespaces, tags, and users all have objects, and these give natural places to accumulate fitness information. Conventions will evolve for naming and tag values, just as they do for tags and hashtags. Selection pressure will take care of fixing ambiguities exactly to the extent that it’s important and worthwhile to fix them.

10. An alist on everything: One of the oddest moments ever in trying to explain Fluidinfo came when talking to Paul Graham. After at least 10 minutes of trying to find an angle for him, he finally said “oh, I get it. It’s an alist on everything.” I smiled, breathed a sigh of relief, and said yes. Well why didn’t you say so? replied Paul. Just goes to show you can never have too much background in computer science.

August 24, 2009

Truly social data

Filed under: Essence — Terry Jones @ 4:44 am

This is the first in a series of posts that will describe why we’ve built Fluidinfo and what we think it’s good for.

Fluidinfo exists, in part, to address an increasingly apparent mismatch. Humans are extremely social, almost inescapably social. I won’t go into evolutionary history, though. Many of us also use computers that are connected to the internet. Sitting in Barcelona, I can now connect to a machine on the other side of the world in milliseconds. We all can.

Put billions of intensely social humans in front of computers connected to a global network that ties them all together, and what do you get? Humans trying to be social using computers.

The mismatch—and the missing piece—is social data.

I don’t think we can get to truly social data while applications maintain tight control over their data. Even calling it their data is likely wrong, as much of the data ingested and stored by applications comes from users and might be in a sense owned by the user. But even that is wrong: how can you own information? You can only think you own it.

These days applications are increasingly open. But things are still far too locked down. The standard way to open one’s data today is to provide an API to let others get at it. But custom APIs are not the answer to truly social data. An API, like a user interface, only lets you do to information what the people controlling access have decided to allow you to do. You can only do what’s been anticipated. You often have to ask permission. You can’t add things to the data as you please.

Data will only be truly social when you can work with it in the kinds of ways we work with information in the real, non-computational, world. In the real world we don’t ask for permission to have an opinion on something, to add to the ball of information surrounding a concept. Our needs don’t have to be anticipated by programmers. We can share information as we please. For example, nobody owns the concept of Barcelona. If I want to essentially “tag” Barcelona as being hot, or noisy, or beautiful, I just do it. I can keep my opinion private, I can share it with certain others, I can hold conflicting opinions, I can organize things in multiple ways at the same time and give things many names.

Fluidinfo lets you do all of the above, and then some.

The main way in which it does this is by changing the control over information. In Fluidinfo, objects (which can correspond to anything – web pages, files, people, movies, ideas, etc) do not have an owner. Any application or user is free to add information to any object. There’s a strong and flexible permissions system—but permissions are applied at the level of the tags (with values) on objects, not at the level of the object itself.

The reason this is so different and much more social is that Fluidinfo gives applications and their users a world in which they can always contribute information. It can take us from a default read-only world to one in which we can all write. Without stopping to ask if it’s ok, and without anyone having to anticipate what we might one day want to do.

In a world of truly social data, any user will be able to customize or personalize anything. You’ll be able to say “I ate there” or “that’s cool” or “that sucks!” or “I know that person” or “I want one of them” or “I’ve read that” or “Hey mum, look at this” or …. or do pretty much whatever you want. I can think of hundreds of examples, making them up at will—you only have to think of the kinds of things you’re used to doing in the real world. Your contributions will be just as important as any other. You’ll be able to search based on your data, or any selection of your friends’ data. You’ll be able to combine your information with heterogeneous information created by others. You’ll be able to augment, organize, and selectively share information as you please.

The nonexistence of truly social data is the huge missing piece in the puzzle of today’s computer applications. It’s the pain we’re all feeling, but which we’re so used to that we don’t realize it. The problem won’t be solved at the level of the application. Truly social data will be the foundation of a new class of applications that all benefit by storing at least some of their information into a common social data architecture. The ones that don’t will be left behind. Because when you think about it, all data is social.

And you can guess the rest: That’s what Fluidinfo is all about. Stay tuned.

« Newer Posts

Powered by WordPress