Tuesday, June 24, 2008

Configuring TortiseMerge for use with Mercurial

I've been using Mercurial primarily on a Mac, but also on a Windows box for a client. Since the Windows box isn't my primary platform, it tends to be under-configured/under-tricked-out. After my first conflicted hg merge, it was time to set up a merge program.
I guess Mercurial looks through the registry where it was finding some wacked out description for P4 merge. (The client currently uses P4.) It was referring to some path that didn't exist.
I have TortiseSVN on the machine from work for a different client, so I figured I'd use that merge program. Googling around, I didn't find an example, so here you go (even though it's not real hard - see MergeToolConfiguration for details):

TortoiseMerge.args=/mine:$local /theirs:$other /base:$base -o /merged:$output


Tuesday, May 27, 2008

DVCS (Mercurial) for Students

I just had a bit of an epiphany for yet another reason why distributed version control systems (DVCS) like Mercurial rock. In an advanced software engineering class (e.g., a capstone project), it would be appropriate to have project teams using a SCM/VCS tool. At my campus, we've never pushed that because it can be a pain to set up a server for students to access. Gotta have a dedicated host for it. Firewalls have to be open. Permissions and users have to be set up. Yada, yada, yada.

However, with a DVCS tool like Mercurial, it would be trivial and wouldn't require any networking at all. Students could use the modern equivalent of SneakerNet: ThumbNet. Put the whole repo on a thumb drive, meet up with your project partners, push and pull, done. Or, if the project is small enough - just email whole repos around. In this context, "distributed" also means you don't need any support from The Man, and that's a good thing.

For better or worse, I don't teach those classes, but if I did....


Friday, May 16, 2008

@Override is your friend

When Java 5 came out, I had my head down teaching and wasn't really paying attention. I've since started working with it and have been using annotations for "big" tasks like Hibernate/JPA metadata. I was pretty underwhelmed by @Override - one of the only stock annotations. When I implement toString, I know I'm overriding the method on Object. Who cares?

The other day I was beating my head into the desk trying to figure out why a Swing table wasn't editable. I had overridden the isCellEditable method on JTable, but the cells weren't editable. Then, I remembered something from the annoations tutorial I'd read at some point: "While it's not required ... it helps to prevent errors." So, I added @Override, and sure enough - I'd misspelled the method name, just the sort of error that @Override can prevent.

I've got the religion. And like any recent convert, I suggest you get it too.


Monday, May 12, 2008

Data Driven, my eye!

I've started using the WebTest web testing framework. Mostly, it's pretty cool. However, I have a bone to pick with screencast demonstrating the dataDriven task.

  1. There's a "slide" that says "Do you know the dataDriven Ant task?" I know of no such standard Ant task. It turns out that it's specific to WebTest. Not really clear.
  2. They show no configuration steps to use it, implying that it works out-of-the-box. I don't know if my environment is wacked (I installed from the developer build, as they suggested), but I had to add an Ant taskdef referring to com.canoo.ant.task.PropertyTableTask, and I only found that by looking in the source.
  3. The screencast shows running ant at the command line, which is how I've been running my tests, but I had to run their webtest script instead. Again, maybe my installation is wacked.
  4. I really wish it could handle data in a TSV/CSV/text file, since I don't have Excel installed on the machine where I'm running these tests, but it only seems to accept an xls file.
  5. Just to add insult to injury, Google Spreadsheet (which I'm using to generate the data file) seems to append a bunch of empty lines to my spreadsheet, which causes the dataDriven task to repeat the last line 90-odd times.
Charles - aka Cranky Pants.

OK, so it sucked to be me, but not any more. I figured out my various issues with the dataDriven task. It turns out that the screencast (clearly) shows them developing in the tests directory of a WebTest project. I missed that and tried to use dataDriven in the top-level build.xml in a target that didn't declare wt.defineTasks as a dependency. Guess what wt.defineTask does - yup, it does the taskdef.
I coupled that breakthrough with breaking down and using Excel to create the xls file (instead of exporting from Google) and viola - I'm livin' large and no longer cranky.


P.S. Apologies to Cheech and Chong.

Wednesday, May 07, 2008


Apple has released Java6 for the Mac, which I have been eagerly awaiting. However, their download page is a bit contradictory:

This is limited to 64-bit Intel macs (which is fine with me), but yet they still call it Universal. What's Universal about that? The only way I could imagine it being less universal is if it they specified a number of cores or CPU speed.

Anyway, Java6 is good stuff for those of us on Universal Core2 Duo iMacs.


Saturday, March 29, 2008

Netbeans is a memory hog?

I was looking at Activity Monitor on my iMac, and I couldn't read the size entry for the Netbeans process that was running. So, I selected it and got a detail report:

I knew that Java and Netbeans could be memory hogs, but 16 million TB (16 exabytes) is a bit much. Good thing I have virtual memory! (I didn't alter the image, but clearly it's a bug with Activity Monitor and/or OS X 10.5.2.)


Thursday, February 28, 2008

Simple, complete example of Python getstate and setstate

I've been doing serialization and/or object-relation mapping in languages like C++ and Java for at least 15 years. I've known about Python's serialization facility (the pickle and cPickle modules) for as long as they've existed, but I've never had a need to use them. Recently, I needed to pickle an object to store in memcached to reduce database traffic.

Wouldn't you know it - the first class I try to pickle throws an exception because it contains some attributes that can't be serialized. I couldn't figure out where the problem was because the Exception and trace back didn't include the name of the attribute that contained the threading lock that couldn't be serialized. However, a quick look at the code revealed a couple of suspects.

Even though I didn't know which attributes were causing the problem, I knew that the only solution would be to take control of the serialization process. Once I could pick and choose which attributes were being pickled, I could search for the offender(s). As it turned out, both of my initial suspects were guilty of evading pickling.

From the documentation on pickling, I could see that implementing the __getstate__ and __setstate__ methods, but it wasn't clear what those methods need to look like. I found an example online, but the guy was having problems (it was posted to a mailing list), and as I implemented my own methods, I realized what his problem was. So, here's the code:

def __getstate__(self):
result = self.__dict__.copy()
del result['log']
del result['cfg']
return result

The problem I was having with pickling were the logging and configuration attributes. These needed to be removed from the object before pickling. Fortunately, they're not unique to the instance, so they're easy to recreate during unpickling.

As you can tell, __getstate__ returns a dictionary of the object's state. By default (if you didn't implement the method), this is just the __dict__ member. To exclude some attributes, we just need to delete the keys from the dictionary. However, the crucial step is that you have to make a (shallow) copy of __dict__ first. Otherwise, deleting the keys from the dictionary is the same as deleting the attributes from the instance, which would be bad. (This is where the other example I found online failed - he didn't make a copy.)

The __setstate__ method is the reverse, only we don't have to mess with copies:

def __setstate__(self, dict):
self.__dict__ = dict
cfg = self.cfg = getConfig()
self.log = getLog()


Saturday, February 23, 2008

Mac OS X 10.5.2 did not completely fix stacks

Apple's newest update to Leopard, 10.5.2, has greatly improved the new Stack feature by adding a hierarchical list view, but it is still not as functional as the list view in Tiger. As I noted before, I created my own directory that has a collection of aliases (symbolic links) to the applications I use most frequently, as well as links to Applications, Utilities, and the LocalApps directory where I install third-party applications. The problem is even with the 10.5.2 update, the list view does not follow the symbolic links, so my folder of links is basically useless.

I'm still pleased with the list view - it is a huge improvement, but I won't be totally satisfied until it follows links.


Sunday, February 17, 2008

Simply Mercurial

Mercurial is the easiest revision control system I've used, "and so can you" to quote Stephen Colbert. I became interested in the idea of a distributed SCM tool in order to keep my revision history with me while I'm on the road and not necessarily connected. I would have assumed that to get that power, the tool would be more complex - you can't get something for free, right? However, Mercurial is so easy to use, I'm using it for simple one-off revision needs.

Consider the case of a lone developer with a modest number of files to keep track of. To use Mercurial, all he needs to do change into the directory where the files are and run:hg init
That creates a repository, hidden in the .hg subdirectory, and sets the directory up as a working directory. The hg status command shows that none of the files is under control, yet. Running hg add * (or whatever subset of the files is appropriate) marks all of the files to be added to the repository. Finally, hg commit commits the files.

The real beauty was in that first step - hg init. That is so much easier than CVS or Subversion where you either have to create a new repository or figure out where in an existing repository you want to put these files. And it's easier than the dinosaurs, RCS and SCCS, where you have to set up subdirectories to hold the version files in every subdirectory - not to mention the fact that those tools don't really deal with multiple users.

Mercurial is about as simple as can be, and if you never work with multiple developers and passing changes around between developers and repositories, then it stays that simple. Period.


Saturday, February 02, 2008

Complete example of __getattr_ in Python

I've always known about the _getattr_ function on classes in Python and how it could theoretically be used. However, I never had a real need to implement it, and so I had never actually implemented __getattr__. For whatever reason, it was a tad more difficult that I thought, so I figured I'd share an example with you'all.

In case you don't already know, the idea is that any time any code makes are reference to an attribute a class (e.g., obj.x ), __getattr__ gets called to fetch or compute the value of the attribute. This function can do almost anything, but you must be careful when making references to attributes, because that will trigger a recursive call to __getattr__.

In my case, I was writing some code for unit testing. I needed to create a mock object that's used to store configuration information for the system. In the real object, every configuration attribute is initialized from an ini file parsed by ConfigParser in the constructor. For testing, didn't want to have a huge configuration file for every test. So, I wanted to create a system that performed lazy initialization of the data attributes - i.e., only look in the ini file if we actually need a given item, and if the attribute is never referenced, we never need to fetch it from the ini file. Therefore, the ini file only needs the attributes that are actually used by a given test. Implementing __getattr__ is the way to hook into the process to provide this lazy initialization.

The basic outline/algorithm is:
  1. If the attribute already exists on self, return that value
  2. Fetch/compute the missing value
  3. Store the value on self for subsequent use
  4. Return the value
The key to making this work and avoiding infinite recursion is the __dict__ attribute, which is a (regular) dictionary, the keys of which are attributes that exist on the object and the values are the values of the attributes. We can access these keys and values without going through __getattr__, thus avoiding recursion.

def __getattr__(self, attrName):
if not self.__dict__.has_key(attrName):
value = self.fetchAttr(attrName) # computes the value
self.__dict__[attrName] = value
return self.__dict__[attrName]

It's pretty straightforward. In retrospect, I'm not sure what tripped me up when I first went to implement it. In the end, the fetchAttr function ended up being pretty fancy, but I'll write more about that later. You gotta love a dynamic language like Python that makes this as simple as it is, even if it does require a bunch of underscores.