Add to Technorati Favorites

Autovivification in Python: nested defaultdicts with a specific final type

I quite often miss the flexibility of autovivification in Python. That Wikipedia link shows a cute way to get what Perl has:

from collections import defaultdict

def tree():
    return defaultdict(tree)

lupin = tree()
lupin["express"][3] = "stand and deliver"

It’s interesting to think about what’s going on in the above code and why it works. I really like defaultdict.

What I more often want, though, is not infinitely deep nested dictionaries like the above, but a (known) finite number of nested defaultdicts with a specific type at the final level. Here’s a tiny function I wrote to get just that:

from collections import defaultdict

def autovivify(levels=1, final=dict):
    return (defaultdict(final) if levels < 2 else
            defaultdict(lambda: autovivify(levels - 1, final)))

So let's say you were counting the number of occurrences of words said by people by year, month, and day in a chat room. You'd write:

words = autovivify(5, int)

words["sam"][2012][5][25]["hello"] += 1
words["sue"][2012][5][24]["today"] += 1

Etc. It's pretty trivial, but it was fun to think about how to do it with a function and to play with some alternatives. You could do it manually with nested lambdas:

words = defaultdict(lambda: defaultdict(int))
words["sam"]["hello"] += 1

But that gets messy quickly and isn't nearly as much fun.


You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

11 Responses to “Autovivification in Python: nested defaultdicts with a specific final type”

  1. Steve Holden Says:

    You’re a sick man, Jones. Great stuff!

  2. Jens Geiregat Says:

    Thanks for sharing this. I’ve been looking for an elegant way to tackle this kind of problem for a while now.

  3. […] article on autovivification…yeah, I’d never heard of it […]

  4. This feels more natural and easier to me — am I missing something?

    words = defaultdict(lambda: 0)
    words[“sam”, 2012, 5, 25, “hello”] += 1

  5. Hi James.  It depends what you want / need. If your solution works for you then that’s great – it’s what we used to do using awk all the time. The way I’ve done it offers several additional possibilities if you need them. For example you can get a list of all the things at any level of the hierarchy (you might want to sort them, loop over them etc.). You can pass one of the intermediate dicts to some other process. You don’t have to worry or check that one of the keys contains an underscore. You can put anything hashable (e.g., a tuple) into the structure directly without having to convert it to a string, etc. It’s all a matter of what you need to do with the data once it’s inserted.

    Thanks for commenting!

  6. Roman Evstifeev Says:

    even more fun using object attributes as keys of defaultdict:

    class objdict(defaultdict):
        def __getattr__(self, key):
            try:
                return self.__dict__[key]
            except KeyError:
                return self.__getitem__(key)
               
        __setattr__ = lambda self, k, v: self.__setitem__(k,v)

    def objtree():
        return objdict(objtree)

    obj = objtree()
    obj.x.y.z = 4   ### this is easier than obj[“x”][“y”][“z”] = 4

  7. Nice!!

  8. this saved my life!! thanks

  9. Ha! You’re welcome :-)

  10. How would one print the values of each set of keys? This was really helpful–thank you!

  11. Marcus Wiess Says:

    What if you want to call items() method?