July 27, 2008

Mercurial Woes

Over the past few days my friends Ben Collins-Sussman and Jim Blandy and I have been having an interesting conversation about the use of Mercurial for development collaboration. Eventually one of my email responses got so long-winded that I figured it’d be best to make the conversation public.

So, here’s my take on Mercurial, and some reasons for why a HG birds-of-a-feather session at the Mozilla Summit coming up next week would be very useful for me.

To begin with, from a purely social standpoint, the concept of distributed revision control is amazing to me because of how it removes many technological barriers to collaboration, providing software projects with an enormous amount of freedom on how their development process is structured. For any readers who aren’t familiar with it, check out Chapter 1 of the HG Book.

But with all this additional freedom comes additional responsibilities. To quote the MDC Mercurial basics: this gun is loaded.

I’m used to working on small-to-medium sized projects with relatively small teams. Subversion was great for this, because when we were working on things simultaneously, we rarely ran into situations where we were editing the same file, much less the same part of the same file. There were rarely reasons to need to create SVN branches—though we all knew how to do it, and did it when necessary. But as a result, merges were very rare, and when we had to merge, we were extremely careful and diligent about it.

I’m still working on relatively small-to-medium-sized projects (e.g. Weave and Ubiquity) and the forced merging that HG makes us do almost every time we push is a world of pain, relatively speaking. With SVN, I’d just svn commit and see if SVN rejected our commit because someone else committed a change to the same file while we were editing it—this happened rarely, and when it did, we were careful about ensuring that our changes gelled. In this sense, SVN was really humane; 90% of the time things “just worked”, and when things didn’t just work, it was for very good reasons.

But HG almost never “just works”. If I edit a.py and my friend edits b.py and pushes it before I’ve pushed my changes, I have to make a merge commit and manually ensure that nothing bad happened. The end result of this is a huge burden on each programmer compared to SVN, as they have to do a separate merge commit for nearly every push they make, which essentially encourages people to either (A) not push often or (B) ignore their merge commits (a practice which is encouraged by the use of hg fetch). The disadvantages of the first approach are nicely explained by Ben’s post on Programmer Insecurity; the latter approach is bad for obvious reasons.

This is basically the axis around which all my woes with HG revolve. With SVN, it’s really easy to see how code has changed, but because of the constant merging of tiny branches in HG, the whole code history becomes obfuscated and it’s hard to tell what’s happened to it. In fact, several weeks ago a friend somehow mis-merged his commits to Weave, which undid a major refactoring I did, and the really scary thing is that it was somehow impossible for me to tell this had happened from looking at the diff logs alone. I looked at them for a good half-hour or so and was still scratching my head. Needless to say, my inability to understand what had happened to the code by looking at the logs drastically reduced my faith in the tool.

While everyone I know understands the basics of HG and the philosophy behind distributed VCS, it’s the particulars of actually “working in the wild” that many are finding very confusing. So a HG BoF at the Mozilla Summit would be extremely useful.