Autovivification in Python: nested defaultdicts with a specific final type
I quite often miss the flexibility of autovivification in Python. That Wikipedia link shows a cute way to get what Perl has:
from collections import defaultdict
def tree():
return defaultdict(tree)
lupin = tree()
lupin["express"][3] = "stand and deliver"
It’s interesting to think about what’s going on in the above code and why it works. I really like defaultdict.
What I more often want, though, is not infinitely deep nested dictionaries like the above, but a (known) finite number of nested defaultdicts with a specific type at the final level. Here’s a tiny function I wrote to get just that:
from collections import defaultdict
def autovivify(levels=1, final=dict):
return (defaultdict(final) if levels < 2 else
defaultdict(lambda: autovivify(levels - 1, final)))
So let's say you were counting the number of occurrences of words said by people by year, month, and day in a chat room. You'd write:
words = autovivify(5, int)
words["sam"][2012][5][25]["hello"] += 1
words["sue"][2012][5][24]["today"] += 1
Etc. It's pretty trivial, but it was fun to think about how to do it with a function and to play with some alternatives. You could do it manually with nested lambdas:
words = defaultdict(lambda: defaultdict(int))
words["sam"]["hello"] += 1
But that gets messy quickly and isn't nearly as much fun.
You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.
May 26th, 2012 at 7:11 am
You’re a sick man, Jones. Great stuff!
May 27th, 2012 at 9:02 am
Thanks for sharing this. I’ve been looking for an elegant way to tackle this kind of problem for a while now.
June 1st, 2012 at 2:01 pm
[…] article on autovivification…yeah, I’d never heard of it […]
June 3rd, 2012 at 11:16 am
This feels more natural and easier to me — am I missing something?
words = defaultdict(lambda: 0)
words[“sam”, 2012, 5, 25, “hello”] += 1
June 3rd, 2012 at 12:28 pm
Hi James. It depends what you want / need. If your solution works for you then that’s great – it’s what we used to do using awk all the time. The way I’ve done it offers several additional possibilities if you need them. For example you can get a list of all the things at any level of the hierarchy (you might want to sort them, loop over them etc.). You can pass one of the intermediate dicts to some other process. You don’t have to worry or check that one of the keys contains an underscore. You can put anything hashable (e.g., a tuple) into the structure directly without having to convert it to a string, etc. It’s all a matter of what you need to do with the data once it’s inserted.
Thanks for commenting!
June 3rd, 2012 at 4:03 pm
even more fun using object attributes as keys of defaultdict:
class objdict(defaultdict):
def __getattr__(self, key):
try:
return self.__dict__[key]
except KeyError:
return self.__getitem__(key)
__setattr__ = lambda self, k, v: self.__setitem__(k,v)
def objtree():
return objdict(objtree)
obj = objtree()
obj.x.y.z = 4 ### this is easier than obj[“x”][“y”][“z”] = 4
June 3rd, 2012 at 5:02 pm
Nice!!
February 7th, 2015 at 7:12 pm
this saved my life!! thanks
February 7th, 2015 at 8:28 pm
Ha! You’re welcome :-)
December 6th, 2016 at 1:46 am
How would one print the values of each set of keys? This was really helpful–thank you!
February 23rd, 2017 at 1:34 pm
What if you want to call items() method?