Thursday, February 28, 2008

Simple, complete example of Python getstate and setstate

I've been doing serialization and/or object-relation mapping in languages like C++ and Java for at least 15 years. I've known about Python's serialization facility (the pickle and cPickle modules) for as long as they've existed, but I've never had a need to use them. Recently, I needed to pickle an object to store in memcached to reduce database traffic.

Wouldn't you know it - the first class I try to pickle throws an exception because it contains some attributes that can't be serialized. I couldn't figure out where the problem was because the Exception and trace back didn't include the name of the attribute that contained the threading lock that couldn't be serialized. However, a quick look at the code revealed a couple of suspects.

Even though I didn't know which attributes were causing the problem, I knew that the only solution would be to take control of the serialization process. Once I could pick and choose which attributes were being pickled, I could search for the offender(s). As it turned out, both of my initial suspects were guilty of evading pickling.

From the documentation on pickling, I could see that implementing the __getstate__ and __setstate__ methods, but it wasn't clear what those methods need to look like. I found an example online, but the guy was having problems (it was posted to a mailing list), and as I implemented my own methods, I realized what his problem was. So, here's the code:

def __getstate__(self):
result = self.__dict__.copy()
del result['log']
del result['cfg']
return result

The problem I was having with pickling were the logging and configuration attributes. These needed to be removed from the object before pickling. Fortunately, they're not unique to the instance, so they're easy to recreate during unpickling.

As you can tell, __getstate__ returns a dictionary of the object's state. By default (if you didn't implement the method), this is just the __dict__ member. To exclude some attributes, we just need to delete the keys from the dictionary. However, the crucial step is that you have to make a (shallow) copy of __dict__ first. Otherwise, deleting the keys from the dictionary is the same as deleting the attributes from the instance, which would be bad. (This is where the other example I found online failed - he didn't make a copy.)

The __setstate__ method is the reverse, only we don't have to mess with copies:

def __setstate__(self, dict):
self.__dict__ = dict
cfg = self.cfg = getConfig()
self.log = getLog()


Saturday, February 23, 2008

Mac OS X 10.5.2 did not completely fix stacks

Apple's newest update to Leopard, 10.5.2, has greatly improved the new Stack feature by adding a hierarchical list view, but it is still not as functional as the list view in Tiger. As I noted before, I created my own directory that has a collection of aliases (symbolic links) to the applications I use most frequently, as well as links to Applications, Utilities, and the LocalApps directory where I install third-party applications. The problem is even with the 10.5.2 update, the list view does not follow the symbolic links, so my folder of links is basically useless.

I'm still pleased with the list view - it is a huge improvement, but I won't be totally satisfied until it follows links.


Sunday, February 17, 2008

Simply Mercurial

Mercurial is the easiest revision control system I've used, "and so can you" to quote Stephen Colbert. I became interested in the idea of a distributed SCM tool in order to keep my revision history with me while I'm on the road and not necessarily connected. I would have assumed that to get that power, the tool would be more complex - you can't get something for free, right? However, Mercurial is so easy to use, I'm using it for simple one-off revision needs.

Consider the case of a lone developer with a modest number of files to keep track of. To use Mercurial, all he needs to do change into the directory where the files are and run:hg init
That creates a repository, hidden in the .hg subdirectory, and sets the directory up as a working directory. The hg status command shows that none of the files is under control, yet. Running hg add * (or whatever subset of the files is appropriate) marks all of the files to be added to the repository. Finally, hg commit commits the files.

The real beauty was in that first step - hg init. That is so much easier than CVS or Subversion where you either have to create a new repository or figure out where in an existing repository you want to put these files. And it's easier than the dinosaurs, RCS and SCCS, where you have to set up subdirectories to hold the version files in every subdirectory - not to mention the fact that those tools don't really deal with multiple users.

Mercurial is about as simple as can be, and if you never work with multiple developers and passing changes around between developers and repositories, then it stays that simple. Period.


Saturday, February 02, 2008

Complete example of __getattr_ in Python

I've always known about the _getattr_ function on classes in Python and how it could theoretically be used. However, I never had a real need to implement it, and so I had never actually implemented __getattr__. For whatever reason, it was a tad more difficult that I thought, so I figured I'd share an example with you'all.

In case you don't already know, the idea is that any time any code makes are reference to an attribute a class (e.g., obj.x ), __getattr__ gets called to fetch or compute the value of the attribute. This function can do almost anything, but you must be careful when making references to attributes, because that will trigger a recursive call to __getattr__.

In my case, I was writing some code for unit testing. I needed to create a mock object that's used to store configuration information for the system. In the real object, every configuration attribute is initialized from an ini file parsed by ConfigParser in the constructor. For testing, didn't want to have a huge configuration file for every test. So, I wanted to create a system that performed lazy initialization of the data attributes - i.e., only look in the ini file if we actually need a given item, and if the attribute is never referenced, we never need to fetch it from the ini file. Therefore, the ini file only needs the attributes that are actually used by a given test. Implementing __getattr__ is the way to hook into the process to provide this lazy initialization.

The basic outline/algorithm is:
  1. If the attribute already exists on self, return that value
  2. Fetch/compute the missing value
  3. Store the value on self for subsequent use
  4. Return the value
The key to making this work and avoiding infinite recursion is the __dict__ attribute, which is a (regular) dictionary, the keys of which are attributes that exist on the object and the values are the values of the attributes. We can access these keys and values without going through __getattr__, thus avoiding recursion.

def __getattr__(self, attrName):
if not self.__dict__.has_key(attrName):
value = self.fetchAttr(attrName) # computes the value
self.__dict__[attrName] = value
return self.__dict__[attrName]

It's pretty straightforward. In retrospect, I'm not sure what tripped me up when I first went to implement it. In the end, the fetchAttr function ended up being pretty fancy, but I'll write more about that later. You gotta love a dynamic language like Python that makes this as simple as it is, even if it does require a bunch of underscores.