Thursday, February 28, 2008

Simple, complete example of Python getstate and setstate

I've been doing serialization and/or object-relation mapping in languages like C++ and Java for at least 15 years. I've known about Python's serialization facility (the pickle and cPickle modules) for as long as they've existed, but I've never had a need to use them. Recently, I needed to pickle an object to store in memcached to reduce database traffic.

Wouldn't you know it - the first class I try to pickle throws an exception because it contains some attributes that can't be serialized. I couldn't figure out where the problem was because the Exception and trace back didn't include the name of the attribute that contained the threading lock that couldn't be serialized. However, a quick look at the code revealed a couple of suspects.

Even though I didn't know which attributes were causing the problem, I knew that the only solution would be to take control of the serialization process. Once I could pick and choose which attributes were being pickled, I could search for the offender(s). As it turned out, both of my initial suspects were guilty of evading pickling.

From the documentation on pickling, I could see that implementing the __getstate__ and __setstate__ methods, but it wasn't clear what those methods need to look like. I found an example online, but the guy was having problems (it was posted to a mailing list), and as I implemented my own methods, I realized what his problem was. So, here's the code:

def __getstate__(self):
result = self.__dict__.copy()
del result['log']
del result['cfg']
return result

The problem I was having with pickling were the logging and configuration attributes. These needed to be removed from the object before pickling. Fortunately, they're not unique to the instance, so they're easy to recreate during unpickling.

As you can tell, __getstate__ returns a dictionary of the object's state. By default (if you didn't implement the method), this is just the __dict__ member. To exclude some attributes, we just need to delete the keys from the dictionary. However, the crucial step is that you have to make a (shallow) copy of __dict__ first. Otherwise, deleting the keys from the dictionary is the same as deleting the attributes from the instance, which would be bad. (This is where the other example I found online failed - he didn't make a copy.)

The __setstate__ method is the reverse, only we don't have to mess with copies:

def __setstate__(self, dict):
self.__dict__ = dict
cfg = self.cfg = getConfig()
self.log = getLog()


Enjoy,
Charles.

3 comments:

PJE said...

"""As you can tell, __getstate__ returns a dictionary of the object's state."""

It doesn't have to. You could return a tuple, a list, a string, a number, or any other pickle-able value you like, if that's the only state you have.

Charles Anderson said...

Good point. I hadn't thought of that, but so long as setstate and getstate are in-sync with each other, that would work. It could even be a string, but that would be kinda silly since pickling is trying to create a a string.

denis said...

""" cPickle Dotdict(dict) works w protocol 0 but not -1
"""
import cPickle
import sys

class Dotdict(dict):
""" d.x -> d["x"] """
def __getattr__(self, attr):
return self.get(attr, None)
def __getstate__(self): # dumps
return self
def __setstate__(self, s): # loads
self = s # ??

if __name__ == "__main__":
protocol = 0
if sys.argv[1:]:
exec( "\n".join( sys.argv[1:] ))

d = Dotdict( a=1 )
print d.a, d["a"], d.get( "no", 42 ), d, d.keys(), "%(a)s" % d
# 1 1 42 {'a': 1} ['a'] 1
# print "%(no)s" % d

ddump = cPickle.dumps( d, protocol )
# -1: TypeError: 'NoneType' object is not callable
l = cPickle.loads( ddump )
print "loads:", l
assert d == l

(grr no pre code ?)