Indicating (shared) interest in things without disclosing what they are

March 5th, 2011 by Terry Jones. Filed under Awesomeness, Essence, Howto, Programming.

Imagine you want wanted to tell the world you were interested in something, for example an email address or a phone number, without telling the world what that thing was. That may not sound so interesting, but if several people were doing the same thing, it would be a mechanism for discovery of private things you had in common, without telling anyone else what those things were.

Russell Manley and I just thought of a simple way to do this using Fluidinfo. Here’s how we did it for the email addresses we know.

For each email address, compute its MD5 sum. Then, put a rustlem/knows or terrycojones/knows tag onto the object whose fluiddb/about value is the MD5 sum. The MD5 algorithm is essentially one-way, so even if someone finds a Fluidinfo object with either of our tags on it (which is trivial) they cannot recover the original email address.

This is pretty nice. We’re independently indicating things of interest, but neither of us is publicly saying what those things are. Because we’re putting our information onto the same objects in Fluidinfo, we can then easily discover things we have in common with each other (and with others), without the world knowing what. We can do the same thing for phone numbers, or anything else.

Getting the data into Fluidinfo was trivial. Here’s code I used to put a terrycojones/knows tag (with value True) onto the appropriate objects:

import sys, hashlib
from fom.session import Fluid

fdb = Fluid()
fdb.login('terrycojones', 'PASSWORD')

for thing in sys.stdin.readlines():
    about = hashlib.md5(thing[:-1]).hexdigest()

You pass a list of email addresses to this script on standard input.

Russell and I each had about a thousand email addresses in our address books. A first question is how many addresses we know in common. You can get the answer to this with the simple Fluidinfo query has terrycojones/knows and has rustlem/knows. It turns out there are 53 common addresses. But the results don’t tell us which addresses those are, which is also interesting.

We also wrote a small script to print any tags ending in /knows for a set of email addresses given on the command line.

import sys, hashlib
from fom.session import Fluid
from fom.errors import Fluid404Error
fdb = Fluid()

for thing in sys.argv[1:]:
    about = hashlib.md5(thing).hexdigest()
    print thing, about
        for tag in fdb.about[about].get().value['tagPaths']:
            if tag.endswith('/knows'):
                print '\t', tag
    except Fluid404Error:
        print '\tunknown'

So given an email address, we can run the above and see who else knows (or claims to) that email address.

We find all this quite thought provoking. Without going into details of the social side of this, it’s worth pointing out that Fluidinfo makes this kind of information sharing very easy because it has a guaranteed writable object for everything, including all MD5 sums. Because the fluiddb/about tag is unique and isn’t owned by anyone, any user can add their knows tag to the object for any MD5 sum. The ability for users and applications to work independently and yet to share information by just following a fluiddb/about convention is one of the coolest things about Fluidinfo.

Finally, note that this system does not guarantee privacy. If someone already knows an email address or phone number (etc) they can compute its MD5 sum and examine the Fluidinfo tags on the corresponding object. Doing so they might see a rustlem/knows tag and would then be free to draw their own conclusion.

You can play too. All you need is a Fluidinfo account and the above code. Please let us know how you get on. For example, you can freely tweet any MD5 sums we have in common. We’re going to use the hashtag #incommon, like this.

  • Sharing the md5 sum is actually not that safe. Companies with huge databases of e-mails have compiled md5 sums of those addresses and made them available online. Using one of those you can do reverse lookup some e-mail addresses from their md5 sum. Rainbow tables, pretty much.

    Other than that, I like the idea of sharing knowledge about secret stuff in public. Even though that metadata could be used the decipher the secret too. “what friends do those two have in common?”.

  • Hi Emil

    Yes, agree with all that. Even using SHA256 (or whatever) doesn’t make it secure. But being secure isn’t really the point. We were in part trying to not directly publish people’s email addresses, and also the idea was to do this for many things, not just emails (e.g., phone numbers, genetic markers). The lookup that you describe is always going to be a factor. But there’s a deliberate vagueness in the tagging – the source text is something I know about, but I don’t say in what way. Is it an email address that I received mail from, sent mail to, harvested, overheard, am following, am interested in for some other reason, that I’d like to meet, or am I just bluffing, etc? I find it all quite interesting, as you’re taking something that’s entirely private and putting it (in some sense) in a place that’s entirely public, yet you’re not saying what it is or what your interest is. So there’s some kind of intermediate social signalling that’s made possible, and that makes me think a lot.

    I’ve long planned to do the same for phone numbers, but *without* the MD5 sums. I wouldn’t say that the number is a phone number, just that it’s a number I may or may not have an interest in (e.g., I could tag some random numbers too). People can interpret them however they like. They could take them to be phone numbers and start calling them, but they wouldn’t know who they were calling (supposing it was a phone number) and the callee would likely tell them they had a wrong number & hang up. But if you started with a number and found my tag on the object, you really do know something – or at least there’s some evidence that you might. Again, the social dynamics around this are something I find really interesting.

    Thanks for the comment. What are your thoughts?

  • Here is a similar story

    Who’s to say when we should disclose those sometimes unpleasant, yet vitally important things about ourselves? After the 3rd, 5th, or 14th date? If you reveal too much of yourself too soon, you stand the risk of scaring the other person off. Conversely, if you withhold too much information, you are seen as secretive and untrustworthy. Unfortunately there is no universal disclosure code in the dating game which makes it so hard.

  • This makes me curious to see how fluidinfo users are connected by the objects they tag. Is it possible to get a list of objects that a user has tagged? I took a quick look at the User API, but didn’t see it there :/

  • Hi Eric!

    No. You’d need to walk their tags and do a “has” query on each one. That’s not too hard…

  • I developed a similar technique 2 years ago to track objects in Nova Initia.
    While obviously hashes aren’t completely secure, that wasn’t the goal.
    We track objects on URL’s, but we didn’t want to sacrifice player
    privacy. This allows us to track player created objects without having
    knowledge of where that object is. It has worked beautifully. Terry, I’d
    like to speak with you further on this subject when you get some time.