Chapter 9
Finding and fixing your mistakes

To err might be human, but to really handle the consequences well takes a top-notch revision control system. In this chapter, we’ll discuss some of the techniques you can use when you find that a problem has crept into your project. Mercurial has some highly capable features that will help you to isolate the sources of problems, and to handle them appropriately.

9.1 Erasing local history

9.1.1 The accidental commit

I have the occasional but persistent problem of typing rather more quickly than I can think, which sometimes results in me committing a changeset that is either incomplete or plain wrong. In my case, the usual kind of incomplete changeset is one in which I’ve created a new source file, but forgotten to hg add” it. A “plain wrong” changeset is not as common, but no less annoying.

9.1.2 Rolling back a transaction

In section 4.2.2, I mentioned that Mercurial treats each modification of a repository as a transaction. Every time you commit a changeset or pull changes from another repository, Mercurial remembers what you did. You can undo, or roll back, exactly one of these actions using the hg rollback” command. (See section 9.1.4 for an important caveat about the use of this command.)

Here’s a mistake that I often find myself making: committing a change in which I’ve created a new file, but forgotten to hg add” it.

1  $ hg status
2  M a
3  $ echo b > b
4  $ hg commit -m 'Add file b'

Looking at the output of hg status” after the commit immediately confirms the error.

1  $ hg status
2  ? b
3  $ hg tip
4  changeset:   1:8a557de9a817
5  tag:         tip
6  user:        Bryan O'Sullivan <bos@serpentine.com>
7  date:        Mon Dec 10 19:54:38 2007 +0000
8  summary:     Add file b
9  

The commit captured the changes to the file a, but not the new file b. If I were to push this changeset to a repository that I shared with a colleague, the chances are high that something in a would refer to b, which would not be present in their repository when they pulled my changes. I would thus become the object of some indignation.

However, luck is with me—I’ve caught my error before I pushed the changeset. I use the hg rollback” command, and Mercurial makes that last changeset vanish.

1  $ hg rollback
2  rolling back last transaction
3  $ hg tip
4  changeset:   0:15f839321f52
5  tag:         tip
6  user:        Bryan O'Sullivan <bos@serpentine.com>
7  date:        Mon Dec 10 19:54:38 2007 +0000
8  summary:     First commit
9  
10  $ hg status
11  M a
12  ? b

Notice that the changeset is no longer present in the repository’s history, and the working directory once again thinks that the file a is modified. The commit and rollback have left the working directory exactly as it was prior to the commit; the changeset has been completely erased. I can now safely hg add” the file b, and rerun my commit.

1  $ hg add b
2  $ hg commit -m 'Add file b, this time for real'

9.1.3 The erroneous pull

It’s common practice with Mercurial to maintain separate development branches of a project in different repositories. Your development team might have one shared repository for your project’s “0.9” release, and another, containing different changes, for the “1.0” release.

Given this, you can imagine that the consequences could be messy if you had a local “0.9” repository, and accidentally pulled changes from the shared “1.0” repository into it. At worst, you could be paying insufficient attention, and push those changes into the shared “0.9” tree, confusing your entire team (but don’t worry, we’ll return to this horror scenario later). However, it’s more likely that you’ll notice immediately, because Mercurial will display the URL it’s pulling from, or you will see it pull a suspiciously large number of changes into the repository.

The hg rollback” command will work nicely to expunge all of the changesets that you just pulled. Mercurial groups all changes from one hg pull” into a single transaction, so one hg rollback” is all you need to undo this mistake.

9.1.4 Rolling back is useless once you’ve pushed

The value of the hg rollback” command drops to zero once you’ve pushed your changes to another repository. Rolling back a change makes it disappear entirely, but only in the repository in which you perform the hg rollback”. Because a rollback eliminates history, there’s no way for the disappearance of a change to propagate between repositories.

If you’ve pushed a change to another repository—particularly if it’s a shared repository—it has essentially “escaped into the wild,” and you’ll have to recover from your mistake in a different way. What will happen if you push a changeset somewhere, then roll it back, then pull from the repository you pushed to, is that the changeset will reappear in your repository.

(If you absolutely know for sure that the change you want to roll back is the most recent change in the repository that you pushed to, and you know that nobody else could have pulled it from that repository, you can roll back the changeset there, too, but you really should really not rely on this working reliably. If you do this, sooner or later a change really will make it into a repository that you don’t directly control (or have forgotten about), and come back to bite you.)

9.1.5 You can only roll back once

Mercurial stores exactly one transaction in its transaction log; that transaction is the most recent one that occurred in the repository. This means that you can only roll back one transaction. If you expect to be able to roll back one transaction, then its predecessor, this is not the behaviour you will get.

1  $ hg rollback
2  rolling back last transaction
3  $ hg rollback
4  no rollback information available

Once you’ve rolled back one transaction in a repository, you can’t roll back again in that repository until you perform another commit or pull.

9.2 Reverting the mistaken change

If you make a modification to a file, and decide that you really didn’t want to change the file at all, and you haven’t yet committed your changes, the hg revert” command is the one you’ll need. It looks at the changeset that’s the parent of the working directory, and restores the contents of the file to their state as of that changeset. (That’s a long-winded way of saying that, in the normal case, it undoes your modifications.)

Let’s illustrate how the hg revert” command works with yet another small example. We’ll begin by modifying a file that Mercurial is already tracking.

1  $ cat file
2  original content
3  $ echo unwanted change >> file
4  $ hg diff file
5  diff -r 7365fa775ae5 file
6  --- a/file Mon Dec 10 19:54:27 2007 +0000
7  +++ b/file Mon Dec 10 19:54:27 2007 +0000
8  @@ -1,1 +1,2 @@ original content
9   original content
10  +unwanted change

If we don’t want that change, we can simply hg revert” the file.

1  $ hg status
2  M file
3  $ hg revert file
4  $ cat file
5  original content

The hg revert” command provides us with an extra degree of safety by saving our modified file with a .orig extension.

1  $ hg status
2  ? file.orig
3  $ cat file.orig
4  original content
5  unwanted change

Here is a summary of the cases that the hg revert” command can deal with. We will describe each of these in more detail in the section that follows.

9.2.1 File management errors

The hg revert” command is useful for more than just modified files. It lets you reverse the results of all of Mercurial’s file management commands—hg add”, hg remove”, and so on.

If you hg add” a file, then decide that in fact you don’t want Mercurial to track it, use hg revert” to undo the add. Don’t worry; Mercurial will not modify the file in any way. It will just “unmark” the file.

1  $ echo oops > oops
2  $ hg add oops
3  $ hg status oops
4  A oops
5  $ hg revert oops
6  $ hg status
7  ? oops

Similarly, if you ask Mercurial to hg remove” a file, you can use hg revert” to restore it to the contents it had as of the parent of the working directory.

1  $ hg remove file
2  $ hg status
3  R file
4  $ hg revert file
5  $ hg status
6  $ ls file
7  file

This works just as well for a file that you deleted by hand, without telling Mercurial (recall that in Mercurial terminology, this kind of file is called “missing”).

1  $ rm file
2  $ hg status
3  ! file
4  $ hg revert file
5  $ ls file
6  file

If you revert a hg copy”, the copied-to file remains in your working directory afterwards, untracked. Since a copy doesn’t affect the copied-from file in any way, Mercurial doesn’t do anything with the copied-from file.

1  $ hg copy file new-file
2  $ hg revert new-file
3  $ hg status
4  ? new-file

A slightly special case: reverting a rename

If you hg rename” a file, there is one small detail that you should remember. When you hg revert” a rename, it’s not enough to provide the name of the renamed-to file, as you can see here.

1  $ hg rename file new-file
2  $ hg revert new-file
3  $ hg status
4  ? new-file

As you can see from the output of hg status”, the renamed-to file is no longer identified as added, but the renamed-from file is still removed! This is counter-intuitive (at least to me), but at least it’s easy to deal with.

1  $ hg revert file
2  no changes needed to file
3  $ hg status
4  ? new-file

So remember, to revert a hg rename”, you must provide both the source and destination names.

(By the way, if you rename a file, then modify the renamed-to file, then revert both components of the rename, when Mercurial restores the file that was removed as part of the rename, it will be unmodified. If you need the modifications in the renamed-to file to show up in the renamed-from file, don’t forget to copy them over.)

These fiddly aspects of reverting a rename arguably constitute a small bug in Mercurial.

9.3 Dealing with committed changes

Consider a case where you have committed a change a, and another change b on top of it; you then realise that change a was incorrect. Mercurial lets you “back out” an entire changeset automatically, and building blocks that let you reverse part of a changeset by hand.

Before you read this section, here’s something to keep in mind: the hg backout” command undoes changes by adding history, not by modifying or erasing it. It’s the right tool to use if you’re fixing bugs, but not if you’re trying to undo some change that has catastrophic consequences. To deal with those, see section 9.4.

9.3.1 Backing out a changeset

The hg backout” command lets you “undo” the effects of an entire changeset in an automated fashion. Because Mercurial’s history is immutable, this command does not get rid of the changeset you want to undo. Instead, it creates a new changeset that reverses the effect of the to-be-undone changeset.

The operation of the hg backout” command is a little intricate, so let’s illustrate it with some examples. First, we’ll create a repository with some simple changes.

1  $ hg init myrepo
2  $ cd myrepo
3  $ echo first change >> myfile
4  $ hg add myfile
5  $ hg commit -m 'first change'
6  $ echo second change >> myfile
7  $ hg commit -m 'second change'

The hg backout” command takes a single changeset ID as its argument; this is the changeset to back out. Normally, hg backout” will drop you into a text editor to write a commit message, so you can record why you’re backing the change out. In this example, we provide a commit message on the command line using the -m option.

9.3.2 Backing out the tip changeset

We’re going to start by backing out the last changeset we committed.

1  $ hg backout -m 'back out second change' tip
2  reverting myfile
3  changeset 2:e02eba531f95 backs out changeset 1:c3e45317eb42
4  $ cat myfile
5  first change

You can see that the second line from myfile is no longer present. Taking a look at the output of hg log” gives us an idea of what the hg backout” command has done.

1  $ hg log --style compact
2  2[tip]   e02eba531f95   2007-12-10 19:53 +0000   bos
3    back out second change
4  
5  1   c3e45317eb42   2007-12-10 19:53 +0000   bos
6    second change
7  
8  0   f3db226c9812   2007-12-10 19:53 +0000   bos
9    first change
10  

Notice that the new changeset that hg backout” has created is a child of the changeset we backed out. It’s easier to see this in figure 9.1, which presents a graphical view of the change history. As you can see, the history is nice and linear.


PIC

Figure 9.1: Backing out a change using the hg backout” command

9.3.3 Backing out a non-tip change

If you want to back out a change other than the last one you committed, pass the --merge option to the hg backout” command.

1  $ cd ..
2  $ hg clone -r1 myrepo non-tip-repo
3  requesting all changes
4  adding changesets
5  adding manifests
6  adding file changes
7  added 2 changesets with 2 changes to 1 files
8  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
9  $ cd non-tip-repo

This makes backing out any changeset a “one-shot” operation that’s usually simple and fast.

1  $ echo third change >> myfile
2  $ hg commit -m 'third change'
3  $ hg backout --merge -m 'back out second change' 1
4  reverting myfile
5  changeset 3:dc2f0a481aab backs out changeset 1:c3e45317eb42
6  merging with changeset 2:6beeebc9c7f6
7  merging myfile
8  0 files updated, 1 files merged, 0 files removed, 0 files unresolved
9  (branch merge, don't forget to commit)

If you take a look at the contents of myfile after the backout finishes, you’ll see that the first and third changes are present, but not the second.

1  $ cat myfile
2  first change
3  third change

As the graphical history in figure 9.2 illustrates, Mercurial actually commits two changes in this kind of situation (the box-shaped nodes are the ones that Mercurial commits automatically). Before Mercurial begins the backout process, it first remembers what the current parent of the working directory is. It then backs out the target changeset, and commits that as a changeset. Finally, it merges back to the previous parent of the working directory, and commits the result of the merge.


PIC

Figure 9.2: Automated backout of a non-tip change using the hg backout” command

The result is that you end up “back where you were”, only with some extra history that undoes the effect of the changeset you wanted to back out.

Always use the --merge option

In fact, since the --merge option will do the “right thing” whether or not the changeset you’re backing out is the tip (i.e. it won’t try to merge if it’s backing out the tip, since there’s no need), you should always use this option when you run the hg backout” command.

9.3.4 Gaining more control of the backout process

While I’ve recommended that you always use the --merge option when backing out a change, the hg backout” command lets you decide how to merge a backout changeset. Taking control of the backout process by hand is something you will rarely need to do, but it can be useful to understand what the hg backout” command is doing for you automatically. To illustrate this, let’s clone our first repository, but omit the backout change that it contains.

1  $ cd ..
2  $ hg clone -r1 myrepo newrepo
3  requesting all changes
4  adding changesets
5  adding manifests
6  adding file changes
7  added 2 changesets with 2 changes to 1 files
8  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
9  $ cd newrepo

As with our earlier example, We’ll commit a third changeset, then back out its parent, and see what happens.

1  $ echo third change >> myfile
2  $ hg commit -m 'third change'
3  $ hg backout -m 'back out second change' 1
4  reverting myfile
5  changeset 3:dc2f0a481aab backs out changeset 1:c3e45317eb42
6  the backout changeset is a new head - do not forget to merge
7  (use "backout --merge" if you want to auto-merge)

Our new changeset is again a descendant of the changeset we backout out; it’s thus a new head, not a descendant of the changeset that was the tip. The hg backout” command was quite explicit in telling us this.

1  $ hg log --style compact
2  3[tip]:1   dc2f0a481aab   2007-12-10 19:53 +0000   bos
3    back out second change
4  
5  2   6beeebc9c7f6   2007-12-10 19:53 +0000   bos
6    third change
7  
8  1   c3e45317eb42   2007-12-10 19:53 +0000   bos
9    second change
10  
11  0   f3db226c9812   2007-12-10 19:53 +0000   bos
12    first change
13  

Again, it’s easier to see what has happened by looking at a graph of the revision history, in figure 9.3. This makes it clear that when we use hg backout” to back out a change other than the tip, Mercurial adds a new head to the repository (the change it committed is box-shaped).


PIC

Figure 9.3: Backing out a change using the hg backout” command

After the hg backout” command has completed, it leaves the new “backout” changeset as the parent of the working directory.

1  $ hg parents
2  changeset:   3:dc2f0a481aab
3  tag:         tip
4  parent:      1:c3e45317eb42
5  user:        Bryan O'Sullivan <bos@serpentine.com>
6  date:        Mon Dec 10 19:53:16 2007 +0000
7  summary:     back out second change
8  

Now we have two isolated sets of changes.

1  $ hg heads
2  changeset:   3:dc2f0a481aab
3  tag:         tip
4  parent:      1:c3e45317eb42
5  user:        Bryan O'Sullivan <bos@serpentine.com>
6  date:        Mon Dec 10 19:53:16 2007 +0000
7  summary:     back out second change
8  
9  changeset:   2:6beeebc9c7f6
10  user:        Bryan O'Sullivan <bos@serpentine.com>
11  date:        Mon Dec 10 19:53:16 2007 +0000
12  summary:     third change
13  

Let’s think about what we expect to see as the contents of myfile now. The first change should be present, because we’ve never backed it out. The second change should be missing, as that’s the change we backed out. Since the history graph shows the third change as a separate head, we don’t expect to see the third change present in myfile.

1  $ cat myfile
2  first change

To get the third change back into the file, we just do a normal merge of our two heads.

1  $ hg merge
2  merging myfile
3  0 files updated, 1 files merged, 0 files removed, 0 files unresolved
4  (branch merge, don't forget to commit)
5  $ hg commit -m 'merged backout with previous tip'
6  $ cat myfile
7  first change
8  third change

Afterwards, the graphical history of our repository looks like figure 9.4.


PIC

Figure 9.4: Manually merging a backout change

9.3.5 Why “hg backout” works as it does

Here’s a brief description of how the hg backout” command works.

  1. It ensures that the working directory is “clean”, i.e. that the output of hg status” would be empty.
  2. It remembers the current parent of the working directory. Let’s call this changeset orig
  3. It does the equivalent of a hg update” to sync the working directory to the changeset you want to back out. Let’s call this changeset backout
  4. It finds the parent of that changeset. Let’s call that changeset parent.
  5. For each file that the backout changeset affected, it does the equivalent of a hg revert -r parent” on that file, to restore it to the contents it had before that changeset was committed.
  6. It commits the result as a new changeset. This changeset has backout as its parent.
  7. If you specify --merge on the command line, it merges with orig, and commits the result of the merge.

An alternative way to implement the hg backout” command would be to hg export” the to-be-backed-out changeset as a diff, then use the --reverse option to the patch command to reverse the effect of the change without fiddling with the working directory. This sounds much simpler, but it would not work nearly as well.

The reason that hg backout” does an update, a commit, a merge, and another commit is to give the merge machinery the best chance to do a good job when dealing with all the changes between the change you’re backing out and the current tip.

If you’re backing out a changeset that’s 100 revisions back in your project’s history, the chances that the patch command will be able to apply a reverse diff cleanly are not good, because intervening changes are likely to have “broken the context” that patch uses to determine whether it can apply a patch (if this sounds like gibberish, see 12.4 for a discussion of the patch command). Also, Mercurial’s merge machinery will handle files and directories being renamed, permission changes, and modifications to binary files, none of which patch can deal with.

9.4 Changes that should never have been

Most of the time, the hg backout” command is exactly what you need if you want to undo the effects of a change. It leaves a permanent record of exactly what you did, both when committing the original changeset and when you cleaned up after it.

On rare occasions, though, you may find that you’ve committed a change that really should not be present in the repository at all. For example, it would be very unusual, and usually considered a mistake, to commit a software project’s object files as well as its source files. Object files have almost no intrinsic value, and they’re big, so they increase the size of the repository and the amount of time it takes to clone or pull changes.

Before I discuss the options that you have if you commit a “brown paper bag” change (the kind that’s so bad that you want to pull a brown paper bag over your head), let me first discuss some approaches that probably won’t work.

Since Mercurial treats history as accumulative—every change builds on top of all changes that preceded it—you generally can’t just make disastrous changes disappear. The one exception is when you’ve just committed a change, and it hasn’t been pushed or pulled into another repository. That’s when you can safely use the hg rollback” command, as I detailed in section 9.1.2.

After you’ve pushed a bad change to another repository, you could still use hg rollback” to make your local copy of the change disappear, but it won’t have the consequences you want. The change will still be present in the remote repository, so it will reappear in your local repository the next time you pull.

If a situation like this arises, and you know which repositories your bad change has propagated into, you can try to get rid of the changeefrom every one of those repositories. This is, of course, not a satisfactory solution: if you miss even a single repository while you’re expunging, the change is still “in the wild”, and could propagate further.

If you’ve committed one or more changes after the change that you’d like to see disappear, your options are further reduced. Mercurial doesn’t provide a way to “punch a hole” in history, leaving changesets intact.

XXX This needs filling out. The hg-replay script in the examples directory works, but doesn’t handle merge changesets. Kind of an important omission.

9.4.1 Protect yourself from “escaped” changes

If you’ve committed some changes to your local repository and they’ve been pushed or pulled somewhere else, this isn’t necessarily a disaster. You can protect yourself ahead of time against some classes of bad changeset. This is particularly easy if your team usually pulls changes from a central repository.

By configuring some hooks on that repository to validate incoming changesets (see chapter 10), you can automatically prevent some kinds of bad changeset from being pushed to the central repository at all. With such a configuration in place, some kinds of bad changeset will naturally tend to “die out” because they can’t propagate into the central repository. Better yet, this happens without any need for explicit intervention.

For instance, an incoming change hook that verifies that a changeset will actually compile can prevent people from inadvertantly “breaking the build”.

9.5 Finding the source of a bug

While it’s all very well to be able to back out a changeset that introduced a bug, this requires that you know which changeset to back out. Mercurial provides an invaluable extension, called bisect, that helps you to automate this process and accomplish it very efficiently.

The idea behind the bisect extension is that a changeset has introduced some change of behaviour that you can identify with a simple binary test. You don’t know which piece of code introduced the change, but you know how to test for the presence of the bug. The bisect extension uses your test to direct its search for the changeset that introduced the code that caused the bug.

Here are a few scenarios to help you understand how you might apply this extension.

From these examples, it should be clear that the bisect extension is not useful only for finding the sources of bugs. You can use it to find any “emergent property” of a repository (anything that you can’t find from a simple text search of the files in the tree) for which you can write a binary test.

We’ll introduce a little bit of terminology here, just to make it clear which parts of the search process are your responsibility, and which are Mercurial’s. A test is something that you run when bisect chooses a changeset. A probe is what bisect runs to tell whether a revision is good. Finally, we’ll use the word “bisect”, as both a noun and a verb, to stand in for the phrase “search using the bisect extension”.

One simple way to automate the searching process would be simply to probe every changeset. However, this scales poorly. If it took ten minutes to test a single changeset, and you had 10,000 changesets in your repository, the exhaustive approach would take on average 35 days to find the changeset that introduced a bug. Even if you knew that the bug was introduced by one of the last 500 changesets, and limited your search to those, you’d still be looking at over 40 hours to find the changeset that introduced your bug.

What the bisect extension does is use its knowledge of the “shape” of your project’s revision history to perform a search in time proportional to the logarithm of the number of changesets to check (the kind of search it performs is called a dichotomic search). With this approach, searching through 10,000 changesets will take less than two hours, even at ten minutes per test. Limit your search to the last 500 changesets, and it will take less than an hour.

The bisect extension is aware of the “branchy” nature of a Mercurial project’s revision history, so it has no problems dealing with branches, merges, or multiple heads in a repoository. It can prune entire branches of history with a single probe, which is how it operates so efficiently.

9.5.1 Using the bisect extension

Here’s an example of bisect in action. To keep the core of Mercurial simple, bisect is packaged as an extension; this means that it won’t be present unless you explicitly enable it. To do this, edit your hgrc and add the following section header (if it’s not already present):

1  [extensions]

Then add a line to this section to enable the extension:

1  hbisect =
Note: That’s right, there’s a “h” at the front of the name of the bisect extension. The reason is that Mercurial is written in Python, and uses a standard Python package called bisect. If you omit the “h” from the name “hbisect”, Mercurial will erroneously find the standard Python bisect package, and try to use it as a Mercurial extension. This won’t work, and Mercurial will crash repeatedly until you fix the spelling in your hgrc. Ugh.

Now let’s create a repository, so that we can try out the bisect extension in isolation.

1  $ hg init mybug
2  $ cd mybug

We’ll simulate a project that has a bug in it in a simple-minded way: create trivial changes in a loop, and nominate one specific change that will have the “bug”. This loop creates 35 changesets, each adding a single file to the repository. We’ll represent our “bug” with a file that contains the text “i have a gub”.

1  $ buggy_change=22
2  $ for (( i = 0; i < 35; i++ )); do
3  >   if [[ $i = $buggy_change ]]; then
4  >     echo 'i have a gub' > myfile$i
5  >     hg commit -q -A -m 'buggy changeset'
6  >   else
7  >     echo 'nothing to see here, move along' > myfile$i
8  >     hg commit -q -A -m 'normal changeset'
9  >   fi
10  > done

The next thing that we’d like to do is figure out how to use the bisect extension. We can use Mercurial’s normal built-in help mechanism for this.

1  $ hg help bisect
2  hg bisect [help|init|reset|next|good|bad]
3  
4  Dichotomic search in the DAG of changesets
5  
6  This extension helps to find changesets which cause problems.
7  To use, mark the earliest changeset you know introduces the problem
8  as bad, then mark the latest changeset which is free from the problem
9  as good. Bisect will update your working directory to a revision for
10  testing. Once you have performed tests, mark the working directory
11  as bad or good and bisect will either update to another candidate
12  changeset or announce that it has found the bad revision.
13  
14  Note: bisect expects bad revisions to be descendants of good revisions.
15  If you are looking for the point at which a problem was fixed, then make
16  the problem-free state "bad" and the problematic state "good."
17  
18  For subcommands see "hg bisect help"
19  
20  use "hg -v help bisect" to show global options
21  $ hg bisect help
22  list of subcommands for the bisect extension
23  
24   bad     mark revision as bad and update to the next</