Python dictionary keys

  • updated 2012/07/18 to improve the text: the initial version was clearly a work-in-progress

This is an old blog post which I wrote in November of 2011 but which I never published, until now that is.

At work I stumbled onto a subtle bug when I reimplemented a class whose objects where used as a key in a dictionary. Look at the following situation, where I create two dictionaries with the same key:

>>> class B(object):
...   def __init__(self, name):
...     self.name = name
...
>>> pieter = B("Pieter")
>>> name2birthday = { pieter: "September 24th" }
>>> name2nationality = { pieter: "Dutch" }
>>> print name2birthday.items()
[(<__main__.B object at 0x80ab26c>, 'September 24th')]
>>> print name2nationality.items()
[(<__main__.B object at 0x80ab26c>, 'Dutch')]

Because the two dictionaries use the same key, I can iterate over the keys of the first dictionary to iterate over the keys of the second dictionary:

>>> for key, value in name2birthday.items():
...   print name2birthday[key]
...   print name2nationality[key]
...
September 24th
Dutch
>>>

But then I pickle and unpickle the two dictionaries and try to repeat the same trick:

>>> import pickle
>>> pickled_name2birthday = pickle.dumps(name2birthday)
>>> pickled_name2nationality = pickle.dumps(name2nationality)
>>> name2birthday = pickle.loads(pickled_name2birthday)
>>> name2nationality = pickle.loads(pickled_name2nationality)
>>> for key, value in name2birthday.items():
...   print name2birthday[key]
...   print name2nationality[key]
...
September 24th
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
KeyError: <__main__.B object at 0x80ad1ec>
>>>

Huh, the original situation was not as clear to debug as the example above, but it turns out that pickling and unpickling the dictionaries creates not one new key object but two distinct key objects, one for name2birthday and one for name2nationality.

As soon as I suspected this, I knew I had to read up a little on the way Python implements dictionaries. The following article was extremely helpful: http://wiki.python.org/moin/DictionaryKeys

Python dictionaries use a hash function to find the bucket that stores the object in a O(1) time:

all user defined types are usable as dictionary keys with hash(object) defaulting to id(object), and cmp(object1, object2) defaulting to cmp(id(object1), id(object2))

The id function returns the address of the object and directly after creation, there was one pieter object that was used as the key. However after unpickling, there were two pieter objects. This meant I could not use the default hash function.

Furthermore the aforementioned Python Wiki page states the following:

To be used as a dictionary key, an object must support the hash function (e.g. through __hash__), equality comparison (e.g. through __eq__ or __cmp__), and must satisfy the correctness condition above.

The equality comparison is used to find the right key object in case the hash function results in the same bucket for two different objects. The correctness condition mentioned states that if two key objects have the same hash value, they are equivalent. sSo I extended class B with its own __hash__ and __eq__ functions:

>>> class B(object):
...   def __init__(self, name):
...     self.name = name
...   def __hash__(self):
...     return hash(self.name)
...   def __eq__(self, other):
...     return self.name == other.name
...
>>> pieter = B("Pieter")
>>> name2birthday = { pieter: "September 24th" }
>>> name2nationality = { pieter: "Dutch" }
>>> pickled_name2birthday = pickle.dumps(name2birthday)
>>> pickled_name2nationality = pickle.dumps(name2nationality)
>>> name2birthday = pickle.loads(pickled_name2birthday)
>>> name2nationality = pickle.loads(pickled_name2nationality)
>>> for key, value in name2birthday.items():
...   print name2birthday[key]
...   print name2nationality[key]
...
September 24th
Dutch
>>>

Voilà!

Comments

Comments powered by Disqus