Posted Saturday, May 26th, 2012 at 12:10 am under python, tech.

Autovivification in Python: nested defaultdicts with a specific final type

I quite often miss the flexibility of autovivification in Python. That Wikipedia link shows a cute way to get what Perl has:

from collections import defaultdict

def tree():
    return defaultdict(tree)

lupin = tree()
lupin["express"][3] = "stand and deliver"

It’s interesting to think about what’s going on in the above code and why it works. I really like defaultdict.

What I more often want, though, is not infinitely deep nested dictionaries like the above, but a (known) finite number of nested defaultdicts with a specific type at the final level. Here’s a tiny function I wrote to get just that:

from collections import defaultdict

def autovivify(levels=1, final=dict):
    return (defaultdict(final) if levels < 2 else
            defaultdict(lambda: autovivify(levels – 1, final)))

So let’s say you were counting the number of occurrences of words said by people by year, month, and day in a chat room. You’d write:

words = autovivify(5, int)

words["sam"][2012][5][25]["hello"] += 1
words["sue"][2012][5][24]["today"] += 1

Etc. It’s pretty trivial, but it was fun to think about how to do it with a function and to play with some alternatives. You could do it manually with nested lambdas:

words = defaultdict(lambda: defaultdict(int))
words["sam"]["hello"] += 1

But that gets messy quickly and isn’t nearly as much fun.

  • Charlotte

    How would one print the values of each set of keys? This was really helpful–thank you!

  • Ha! You’re welcome :-)

  • nim

    this saved my life!! thanks

  • Nice!!

  • Roman Evstifeev

    even more fun using object attributes as keys of defaultdict:

    class objdict(defaultdict):
        def __getattr__(self, key):
                return self.__dict__[key]
            except KeyError:
                return self.__getitem__(key)
        __setattr__ = lambda self, k, v: self.__setitem__(k,v)

    def objtree():
        return objdict(objtree)

    obj = objtree()
    obj.x.y.z = 4   ### this is easier than obj[“x”][“y”][“z”] = 4

  • Hi James.  It depends what you want / need. If your solution works for you then that’s great – it’s what we used to do using awk all the time. The way I’ve done it offers several additional possibilities if you need them. For example you can get a list of all the things at any level of the hierarchy (you might want to sort them, loop over them etc.). You can pass one of the intermediate dicts to some other process. You don’t have to worry or check that one of the keys contains an underscore. You can put anything hashable (e.g., a tuple) into the structure directly without having to convert it to a string, etc. It’s all a matter of what you need to do with the data once it’s inserted.

    Thanks for commenting!

  • This feels more natural and easier to me — am I missing something?

    words = defaultdict(lambda: 0)
    words[“sam”, 2012, 5, 25, “hello”] += 1

  • Pingback: Weekly Python Links Roundup (6/1/2012) « The Mouse Vs. The Python()

  • Jens Geiregat

    Thanks for sharing this. I’ve been looking for an elegant way to tackle this kind of problem for a while now.

  • Steve Holden

    You’re a sick man, Jones. Great stuff!