Monday, November 25, 2013

Active Record - joins + include Methods Causing an Unintended Join

I recently fixed a problem in my Rails 3.2 app where I was using both the joins and include methods in an Active Record query, and it was triggering a join that I didn't want.  WTF?  Why are you using include and joins, and you don't want a join?

I needed to run a query on table A and I needed to apply criteria against another table B.  Thus, I needed to (inner) join those two with the joins method.    For the rows of A that met the search criteria, I wanted to eagerly the corresponding rows from tables X, Y, and Z.  Of course, I wanted to avoid a 3N+1 query situation.  So, I also used the includes method.

Typically, the includes method generates a query by IDs for the related objects.  In my case, I was getting four INNER JOINs - one each for B, X, Y, and Z.  Under "normal" circumstance, maybe this would have been OK, but my problem is table Y is in a separate database, and you can't join across databases.  (You can't really do transactions across databases, either.)

My original code used an array of named associations in the joins method - joins(:bs).  On a lark, I decided to recode it to use a string - joins('INNER JOIN bs ON bs.a_id ='), and it worked:  I got the inner join for B and three individual queries for X, Y, and Z.  Because Y is queried as a simple query with an array of IDs, the fact that Y is in another database isn't a problem - it just works.

Anyway, if you've stumbled across this post while trying to solve the same problem, I hope this helps.


Thursday, May 23, 2013

Ctags for Puppet - Three (previously missing) Pieces

Back in the day, when I was coding in C on the Unix kernel (before Linux even existed), I used vi's tags functionality extensively.  We had a patched version of vi (before vim existed) that supported tag stacks and a hacked version of ctags that picked up all kinds of things like #defines, and it used the -D flags you used when compiling to get you to the right definition of something that was defined many times for various architectures, etc.  But, when I moved to C++ with function overloading, ctags broke down for me, and I quit using it.

Recently, I inherited pretty big Puppet code base.  For a long time, I was just navigating it by hand using lots of find and grep commands.  Finally, I broke down and figured out how to get ctags working for my Puppet code on OS X.  Actually, other people figured it out, but here were the three pieces I had to string together.

A modern version of ctags - aka exuberant ctags.  This is pretty easy to install with homebrew, but there is a rub: OS X already has a version of it installed, and depending on how your PATH is configured, the stock version might trump homebrew's version.  Matt Pollito has a nice, concise blog post explaining how to cope with that.

Tell ctags about Puppet's syntax: Paul Nasrat has a little post describing the definitions needed in the ~/.ctags file and the invocation of ctags.

Tell vim about Puppet's syntax: Netdata's virmrc file has the last piece:
set iskeyword=-,:,@,48-57,_,192-255
The colon is the key there (no pun intended) - without that, vim wasn't dealing with scoped identifiers and was just hitting the top-level modules.

The last bit is for me to re-learn the muscle memory for navigating with tags that has atrophied after 20 years give or take.  BTW, if you don't have tags, a cool approximation within a single file is '*' in command mode - it searches for the word under the cursor.


Tuesday, May 07, 2013

Hadoop Beginner's Guide

Hadoop Beginner's Guide by Garry Turkington
ISBN: 1849517304

Hadoop Beginner's Guide is, as the title suggests, a new introductory book to the Hadoop ecosystem.  It provides an introduction to how to get up and running with the core components of Hadoop (Map-Reduce and HDFS),  some higher level tools like Hive, integration tools like Sqoop and Flume, and it also provides some good starting information relating to operational issues with Hadoop. This is not an exhaustive reference like Hadoop: The Definitive Guide, and for a beginner, that's probably a good thing.  (In my day, we only had The Definitive Guide, and we liked it!)

Most of the topics are covered in a "dive right in" format.  After some brief introduction to the topic the author provides a list of commands or a block of code and invites you to run it.  This is followed by "What just happened?" that explains the details of the operation or code.  Personally, I don't care for that too much because the explanation is sometimes separated from the code by multiple pages, which was a real hassle reading this as a PDF.  But, maybe that's just me.

As I mentioned, the book includes a couple of chapters on operations, which I found to be a nice addition to a beginner's book.  Some of these operational details were explained by hands-on experiments like shutting down processes or nodes, in which case "What just happened?" is more like "What just broke?"  The operational scenarios are by no means exhaustive (that's what you learn from production), but they provide the reader with some "real life" experience gained in a low-risk environment.  And, they introduce a powerful method to learn more operational details: set up an experiment and find out what happens.  Learning to learn is the most valuable thing you can gain from any book, class, or seminar.

Another nice feature of this book that I haven't seen in others is that the author includes examples of Amazon EC2 and Elastic Map Reduce (EMR).  There are examples of both Map Reduce and Hive jobs on EMR.  He doesn't do everything with "raw" Map Reduce and EMR because once you know the basics of EMR, the same principles apply to both raw Hadoop and EMR.

I do have some complaints about the book, but many of them are nit-picking or personal style.  That said, I think the biggest thing this book would benefit from would be some very detailed "technical editing."  By that I mean there are technical details that got corrupted during the book production process.  For example, the hadoop command is often rendered as Hadoop in examples.  There are plenty of similar formatting and typographic errors. Of course, an experienced Hadoop user wouldn't be tripped up by these, but this is a "beginner's guide," and such details can cause tremendous pain and suffering for newbies.

To wrap things up, Hadoop Beginner's Guide is a pretty good introduction to the Hadoop ecosystem.  I'd recommend it to anyone just starting out with Hadoop before moving on to something more reference-oriented like The Definitive Guide.


FTC disclaimer: I received a free review copy of this book from DZone.  The links to Amazon above contain my Amazon Associates tag.