Table of Contents
Mercurial does not work with files in your repository unless
you tell it to manage them. The hg
status command will tell you which files Mercurial
doesn't know about; it uses a
“?
” to display such
files.
To tell Mercurial to track a file, use the hg add command. Once you have added a
file, the entry in the output of hg
status for that file changes from
“?
” to
“A
”.
$
hg init add-example
$
cd add-example
$
echo a > myfile.txt
$
hg status
? myfile.txt$
hg add myfile.txt
$
hg status
A myfile.txt$
hg commit -m 'Added one file'
$
hg status
After you run a hg commit, the files that you added before the commit will no longer be listed in the output of hg status. The reason for this is that by default, hg status only tells you about “interesting” files—those that you have (for example) modified, removed, or renamed. If you have a repository that contains thousands of files, you will rarely want to know about files that Mercurial is tracking, but that have not changed. (You can still get this information; we'll return to this later.)
Once you add a file, Mercurial doesn't do anything with it immediately. Instead, it will take a snapshot of the file's state the next time you perform a commit. It will then continue to track the changes you make to the file every time you commit, until you remove the file.
A useful behavior that Mercurial has is that if you pass the name of a directory to a command, every Mercurial command will treat this as “I want to operate on every file in this directory and its subdirectories”.
$
mkdir b
$
echo b > b/somefile.txt
$
echo c > b/source.cpp
$
mkdir b/d
$
echo d > b/d/test.h
$
hg add b
adding b/d/test.h adding b/somefile.txt adding b/source.cpp$
hg commit -m 'Added all files in subdirectory'
Notice in this example that Mercurial printed
the names of the files it added, whereas it didn't do so when
we added the file named myfile.txt
in the
earlier example.
What's going on is that in the former case, we explicitly named the file to add on the command line. The assumption that Mercurial makes in such cases is that we know what we are doing, and it doesn't print any output.
However, when we imply the names of files by giving the name of a directory, Mercurial takes the extra step of printing the name of each file that it does something with. This makes it more clear what is happening, and reduces the likelihood of a silent and nasty surprise. This behavior is common to most Mercurial commands.
Mercurial does not track directory information. Instead, it tracks the path to a file. Before creating a file, it first creates any missing directory components of the path. After it deletes a file, it then deletes any empty directories that were in the deleted file's path. This sounds like a trivial distinction, but it has one minor practical consequence: it is not possible to represent a completely empty directory in Mercurial.
Empty directories are rarely useful, and there are unintrusive workarounds that you can use to achieve an appropriate effect. The developers of Mercurial thus felt that the complexity that would be required to manage empty directories was not worth the limited benefit this feature would bring.
If you need an empty directory in your repository, there
are a few ways to achieve this. One is to create a directory,
then hg add a
“hidden” file to that directory. On Unix-like
systems, any file name that begins with a period
(“.
”) is treated as hidden by
most commands and GUI tools. This approach is illustrated
below.
$
hg init hidden-example
$
cd hidden-example
$
mkdir empty
$
touch empty/.hidden
$
hg add empty/.hidden
$
hg commit -m 'Manage an empty-looking directory'
$
ls empty
$
cd ..
$
hg clone hidden-example tmp
updating working directory 1 files updated, 0 files merged, 0 files removed, 0 files unresolved$
ls tmp
empty$
ls tmp/empty
Another way to tackle a need for an empty directory is to simply create one in your automated build scripts before they will need it.
Once you decide that a file no longer belongs in
your repository, use the hg
remove command. This deletes the file, and tells
Mercurial to stop tracking it (which will occur at the next
commit). A removed file is represented in the output of
hg status with a
“R
”.
$
hg init remove-example
$
cd remove-example
$
echo a > a
$
mkdir b
$
echo b > b/b
$
hg add a b
adding b/b$
hg commit -m 'Small example for file removal'
$
hg remove a
$
hg status
R a$
hg remove b
removing b/b
After you hg remove a file, Mercurial will no longer track changes to that file, even if you recreate a file with the same name in your working directory. If you do recreate a file with the same name and want Mercurial to track the new file, simply hg add it. Mercurial will know that the newly added file is not related to the old file of the same name.
It is important to understand that removing a file has only two effects.
Removing a file does not in any way alter the history of the file.
If you update the working directory to a changeset that was committed when it was still tracking a file that you later removed, the file will reappear in the working directory, with the contents it had when you committed that changeset. If you then update the working directory to a later changeset, in which the file had been removed, Mercurial will once again remove the file from the working directory.
Mercurial considers a file that you have deleted, but not
used hg remove to delete, to
be missing. A missing file is
represented with “!
” in the
output of hg status.
Mercurial commands will not generally do anything with missing
files.
$
hg init missing-example
$
cd missing-example
$
echo a > a
$
hg add a
$
hg commit -m 'File about to be missing'
$
rm a
$
hg status
! a
If your repository contains a file that hg status reports as missing, and
you want the file to stay gone, you can run hg remove --after
at any
time later on, to tell Mercurial that you really did mean to
remove the file.
$
hg remove --after a
$
hg status
R a
On the other hand, if you deleted the missing file by accident, give hg revert the name of the file to recover. It will reappear, in unmodified form.
$
hg revert a
$
cat a
a$
hg status
You might wonder why Mercurial requires you to explicitly tell it that you are deleting a file. Early during the development of Mercurial, it let you delete a file however you pleased; Mercurial would notice the absence of the file automatically when you next ran a hg commit, and stop tracking the file. In practice, this made it too easy to accidentally remove a file without noticing.
Mercurial offers a combination command, hg addremove, that adds untracked files and marks missing files as removed.
$
hg init addremove-example
$
cd addremove-example
$
echo a > a
$
echo b > b
$
hg addremove
adding a adding b
The hg commit command
also provides a -A
option that performs this same add-and-remove, immediately
followed by a commit.
$
echo c > c
$
hg commit -A -m 'Commit with addremove'
adding c
Mercurial provides a hg copy command that lets you make a new copy of a file. When you copy a file using this command, Mercurial makes a record of the fact that the new file is a copy of the original file. It treats these copied files specially when you merge your work with someone else's.
What happens during a merge is that changes “follow” a copy. To best illustrate what this means, let's create an example. We'll start with the usual tiny repository that contains a single file.
$
hg init my-copy
$
cd my-copy
$
echo line > file
$
hg add file
$
hg commit -m 'Added a file'
We need to do some work in parallel, so that we'll have something to merge. So let's clone our repository.
$
cd ..
$
hg clone my-copy your-copy
updating working directory 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
Back in our initial repository, let's use the hg copy command to make a copy of the first file we created.
$
cd my-copy
$
hg copy file new-file
If we look at the output of the hg status command afterwards, the copied file looks just like a normal added file.
$
hg status
A new-file
But if we pass the -C
option to hg status, it prints another line of
output: this is the file that our newly-added file was copied
from.
$
hg status -C
A new-file file$
hg commit -m 'Copied file'
Now, back in the repository we cloned, let's make a change in parallel. We'll add a line of content to the original file that we created.
$
cd ../your-copy
$
echo 'new contents' >> file
$
hg commit -m 'Changed file'
Now we have a modified file
in this
repository. When we pull the changes from the first
repository, and merge the two heads, Mercurial will propagate
the changes that we made locally to file
into its copy, new-file
.
$
hg pull ../my-copy
pulling from ../my-copy searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (+1 heads) (run 'hg heads' to see heads, 'hg merge' to merge)$
hg merge
merging file and new-file to new-file 0 files updated, 1 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit)$
cat new-file
line new contents
This behavior—of changes to a file propagating out to copies of the file—might seem esoteric, but in most cases it's highly desirable.
First of all, remember that this propagation only happens when you merge. So if you hg copy a file, and subsequently modify the original file during the normal course of your work, nothing will happen.
The second thing to know is that modifications will only propagate across a copy as long as the changeset that you're merging changes from hasn't yet seen the copy.
The reason that Mercurial does this is as follows. Let's say I make an important bug fix in a source file, and commit my changes. Meanwhile, you've decided to hg copy the file in your repository, without knowing about the bug or having seen the fix, and you have started hacking on your copy of the file.
If you pulled and merged my changes, and Mercurial didn't propagate changes across copies, your new source file would now contain the bug, and unless you knew to propagate the bug fix by hand, the bug would remain in your copy of the file.
By automatically propagating the change that fixed the bug from the original file to the copy, Mercurial prevents this class of problem. To my knowledge, Mercurial is the only revision control system that propagates changes across copies like this.
Once your change history has a record that the copy and subsequent merge occurred, there's usually no further need to propagate changes from the original file to the copied file, and that's why Mercurial only propagates changes across copies at the first merge, and not afterwards.
If, for some reason, you decide that this business of automatically propagating changes across copies is not for you, simply use your system's normal file copy command (on Unix-like systems, that's cp) to make a copy of a file, then hg add the new copy by hand. Before you do so, though, please do reread the section called “Why should changes follow copies?”, and make an informed decision that this behavior is not appropriate to your specific case.
When you use the hg copy command, Mercurial makes a copy of each source file as it currently stands in the working directory. This means that if you make some modifications to a file, then hg copy it without first having committed those changes, the new copy will also contain the modifications you have made up until that point. (I find this behavior a little counterintuitive, which is why I mention it here.)
The hg copy command acts similarly to the Unix cp command (you can use the hg cp alias if you prefer). We must supply two or more arguments, of which the last is treated as the destination, and all others are sources.
If you pass hg copy a single file as the source, and the destination does not exist, it creates a new file with that name.
$
mkdir k
$
hg copy a k
$
ls k
a
If the destination is a directory, Mercurial copies its sources into that directory.
$
mkdir d
$
hg copy a b d
$
ls d
a b
Copying a directory is recursive, and preserves the directory structure of the source.
$
hg copy z e
copying z/a/c to e/a/c
If the source and destination are both directories, the source tree is recreated in the destination directory.
$
hg copy z d
copying z/a/c to d/z/a/c
As with the hg remove
command, if you copy a file manually and then want Mercurial
to know that you've copied the file, simply use the --after
option to hg copy.
$
cp a n
$
hg copy --after a n
It's rather more common to need to rename a file than to make a copy of it. The reason I discussed the hg copy command before talking about renaming files is that Mercurial treats a rename in essentially the same way as a copy. Therefore, knowing what Mercurial does when you copy a file tells you what to expect when you rename a file.
When you use the hg rename command, Mercurial makes a copy of each source file, then deletes it and marks the file as removed.
$
hg rename a b
The hg status command shows the newly copied file as added, and the copied-from file as removed.
$
hg status
A b R a
As with the results of a hg
copy, we must use the -C
option to hg status to see that the added file
is really being tracked by Mercurial as a copy of the original,
now removed, file.
$
hg status -C
A b a R a
As with hg remove and
hg copy, you can tell Mercurial
about a rename after the fact using the --after
option. In most other
respects, the behavior of the hg
rename command, and the options it accepts, are
similar to the hg copy
command.
If you're familiar with the Unix command line, you'll be glad to know that hg rename command can be invoked as hg mv.
Since Mercurial's rename is implemented as copy-and-remove, the same propagation of changes happens when you merge after a rename as after a copy.
If I modify a file, and you rename it to a new name, and then we merge our respective changes, my modifications to the file under its original name will be propagated into the file under its new name. (This is something you might expect to “simply work,” but not all revision control systems actually do this.)
Whereas having changes follow a copy is a feature where you can perhaps nod and say “yes, that might be useful,” it should be clear that having them follow a rename is definitely important. Without this facility, it would simply be too easy for changes to become orphaned when files are renamed.
The case of diverging names occurs when two developers
start with a file—let's call it
foo
—in their respective
repositories.
$
hg clone orig anne
updating working directory 1 files updated, 0 files merged, 0 files removed, 0 files unresolved$
hg clone orig bob
updating working directory 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$
cd anne
$
hg rename foo bar
$
hg ci -m 'Rename foo to bar'
Meanwhile, Bob renames it to
quux
. (Remember that hg mv is an alias for hg rename.)
$
cd ../bob
$
hg mv foo quux
$
hg ci -m 'Rename foo to quux'
I like to think of this as a conflict because each developer has expressed different intentions about what the file ought to be named.
What do you think should happen when they merge their work? Mercurial's actual behavior is that it always preserves both names when it merges changesets that contain divergent renames.
# See http://www.selenic.com/mercurial/bts/issue455$
cd ../orig
$
hg pull -u ../anne
pulling from ../anne searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files 1 files updated, 0 files merged, 1 files removed, 0 files unresolved$
hg pull ../bob
pulling from ../bob searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (+1 heads) (run 'hg heads' to see heads, 'hg merge' to merge)$
hg merge
warning: detected divergent renames of foo to: bar quux 1 files updated, 0 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit)$
ls
bar quux
Notice that while Mercurial warns about the divergent renames, it leaves it up to you to do something about the divergence after the merge.
Another kind of rename conflict occurs when two people choose to rename different source files to the same destination. In this case, Mercurial runs its normal merge machinery, and lets you guide it to a suitable resolution.
Mercurial has a longstanding bug in which it fails to handle a merge where one side has a file with a given name, while another has a directory with the same name. This is documented as issue 29.
$
hg init issue29
$
cd issue29
$
echo a > a
$
hg ci -Ama
adding a$
echo b > b
$
hg ci -Amb
adding b$
hg up 0
0 files updated, 0 files merged, 1 files removed, 0 files unresolved$
mkdir b
$
echo b > b/b
$
hg ci -Amc
adding b/b created new head$
hg merge
abort: Is a directory: /tmp/issue29ozeAHx/issue29/b
Mercurial has some useful commands that will help you to recover from some common mistakes.
The hg revert command lets you undo changes that you have made to your working directory. For example, if you hg add a file by accident, just run hg revert with the name of the file you added, and while the file won't be touched in any way, it won't be tracked for adding by Mercurial any longer, either. You can also use hg revert to get rid of erroneous changes to a file.
It is helpful to remember that the hg revert command is useful for changes that you have not yet committed. Once you've committed a change, if you decide it was a mistake, you can still do something about it, though your options may be more limited.
For more information about the hg revert command, and details about how to deal with changes you have already committed, see Chapter 9, Finding and fixing mistakes.
In a complicated or large project, it's not unusual for a merge of two changesets to result in some headaches. Suppose there's a big source file that's been extensively edited by each side of a merge: this is almost inevitably going to result in conflicts, some of which can take a few tries to sort out.
Let's develop a simple case of this and see how to deal with it. We'll start off with a repository containing one file, and clone it twice.
$
hg init conflict
$
cd conflict
$
echo first > myfile.txt
$
hg ci -A -m first
adding myfile.txt$
cd ..
$
hg clone conflict left
updating working directory 1 files updated, 0 files merged, 0 files removed, 0 files unresolved$
hg clone conflict right
updating working directory 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
In one clone, we'll modify the file in one way.
$
cd left
$
echo left >> myfile.txt
$
hg ci -m left
In another, we'll modify the file differently.
$
cd ../right
$
echo right >> myfile.txt
$
hg ci -m right
Next, we'll pull each set of changes into our original repo.
$
cd ../conflict
$
hg pull -u ../left
pulling from ../left searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files 1 files updated, 0 files merged, 0 files removed, 0 files unresolved$
hg pull -u ../right
pulling from ../right searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (+1 heads) not updating, since new heads added (run 'hg heads' to see heads, 'hg merge' to merge)
We expect our repository to now contain two heads.
$
hg heads
changeset: 2:adc157afe092 tag: tip parent: 0:b704237e4749 user: Bryan O'Sullivan <bos@serpentine.com> date: Tue May 05 06:55:25 2009 +0000 summary: right changeset: 1:64abc51bea49 user: Bryan O'Sullivan <bos@serpentine.com> date: Tue May 05 06:55:25 2009 +0000 summary: left
Normally, if we run hg
merge at this point, it will drop us into a GUI that
will let us manually resolve the conflicting edits to
myfile.txt
. However, to simplify things
for presentation here, we'd like the merge to fail immediately
instead. Here's one way we can do so.
$
export HGMERGE=merge
We've told Mercurial's merge machinery to run the command false (which, as we desire, fails immediately) if it detects a merge that it can't sort out automatically.
If we now fire up hg merge, it should grind to a halt and report a failure.
$
hg merge
merging myfile.txt merge: warning: conflicts during merge merging myfile.txt failed! 0 files updated, 0 files merged, 0 files removed, 1 files unresolved use 'hg resolve' to retry unresolved file merges or 'hg up --clean' to abandon
Even if we don't notice that the merge failed, Mercurial will prevent us from accidentally committing the result of a failed merge.
$
hg commit -m 'Attempt to commit a failed merge'
abort: unresolved merge conflicts (see hg resolve)
When hg commit fails in this case, it suggests that we use the unfamiliar hg resolve command. As usual, hg help resolve will print a helpful synopsis.
When a merge occurs, most files will usually remain unmodified. For each file where Mercurial has to do something, it tracks the state of the file.
If Mercurial sees any file in the unresolved state after a merge, it considers the merge to have failed. Fortunately, we do not need to restart the entire merge from scratch.
The --list
or
-l
option to hg resolve prints out the state of
each merged file.
$
hg resolve -l
U myfile.txt
In the output from hg
resolve, a resolved file is marked with
R
, while an unresolved file is marked with
U
. If any files are listed with
U
, we know that an attempt to commit the
results of the merge will fail.
We have several options to move a file from the unresolved
into the resolved state. By far the most common is to rerun
hg resolve. If we pass the
names of individual files or directories, it will retry the
merges of any unresolved files present in those locations. We
can also pass the --all
or -a
option, which
will retry the merges of all unresolved
files.
Mercurial also lets us modify the resolution state of a
file directly. We can manually mark a file as resolved using
the --mark
option, or
as unresolved using the --unmark
option. This allows
us to clean up a particularly messy merge by hand, and to keep
track of our progress with each file as we go.
The default output of the hg diff command is backwards compatible with the regular diff command, but this has some drawbacks.
Consider the case where we use hg rename to rename a file.
$
hg rename a b
$
hg diff
diff -r 4b300eaa7199 a --- a/a Tue May 05 06:55:21 2009 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,1 +0,0 @@ -a diff -r 4b300eaa7199 b --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/b Tue May 05 06:55:21 2009 +0000 @@ -0,0 +1,1 @@ +a
The output of hg diff above
obscures the fact that we simply renamed a file. The hg diff command accepts an option,
--git
or -g
, to use a newer
diff format that displays such information in a more readable
form.
$
hg diff -g
diff --git a/a b/b rename from a rename to b
This option also helps with a case that can otherwise be confusing: a file that appears to be modified according to hg status, but for which hg diff prints nothing. This situation can arise if we change the file's execute permissions.
$
chmod +x a
$
hg st
M a$
hg diff
The normal diff command pays no attention
to file permissions, which is why hg
diff prints nothing by default. If we supply it
with the -g
option, it tells us what really
happened.
$
hg diff -g
diff --git a/a b/a old mode 100644 new mode 100755
Revision control systems are generally best at managing text files that are written by humans, such as source code, where the files do not change much from one revision to the next. Some centralized revision control systems can also deal tolerably well with binary files, such as bitmap images.
For instance, a game development team will typically manage both its source code and all of its binary assets (e.g. geometry data, textures, map layouts) in a revision control system.
Because it is usually impossible to merge two conflicting modifications to a binary file, centralized systems often provide a file locking mechanism that allow a user to say “I am the only person who can edit this file”.
Compared to a centralized system, a distributed revision control system changes some of the factors that guide decisions over which files to manage and how.
For instance, a distributed revision control system cannot, by its nature, offer a file locking facility. There is thus no built-in mechanism to prevent two people from making conflicting changes to a binary file. If you have a team where several people may be editing binary files frequently, it may not be a good idea to use Mercurial—or any other distributed revision control system—to manage those files.
When storing modifications to a file, Mercurial usually saves only the differences between the previous and current versions of the file. For most text files, this is extremely efficient. However, some files (particularly binary files) are laid out in such a way that even a small change to a file's logical content results in many or most of the bytes inside the file changing. For instance, compressed files are particularly susceptible to this. If the differences between each successive version of a file are always large, Mercurial will not be able to store the file's revision history very efficiently. This can affect both local storage needs and the amount of time it takes to clone a repository.
To get an idea of how this could affect you in practice, suppose you want to use Mercurial to manage an OpenOffice document. OpenOffice stores documents on disk as compressed zip files. Edit even a single letter of your document in OpenOffice, and almost every byte in the entire file will change when you save it. Now suppose that file is 2MB in size. Because most of the file changes every time you save, Mercurial will have to store all 2MB of the file every time you commit, even though from your perspective, perhaps only a few words are changing each time. A single frequently-edited file that is not friendly to Mercurial's storage assumptions can easily have an outsized effect on the size of the repository.
Even worse, if both you and someone else edit the OpenOffice document you're working on, there is no useful way to merge your work. In fact, there isn't even a good way to tell what the differences are between your respective changes.
There are thus a few clear recommendations about specific kinds of files to be very careful with.
Files that are very large and incompressible, e.g. ISO CD-ROM images, will by virtue of sheer size make clones over a network very slow.
Files that change a lot from one revision to the next may be expensive to store if you edit them frequently, and conflicts due to concurrent edits may be difficult to resolve.
Since Mercurial maintains a complete copy of history in each clone, everyone who uses Mercurial to collaborate on a project can potentially act as a source of backups in the event of a catastrophe. If a central repository becomes unavailable, you can construct a replacement simply by cloning a copy of the repository from one contributor, and pulling any changes they may not have seen from others.
It is simple to use Mercurial to perform off-site backups and remote mirrors. Set up a periodic job (e.g. via the cron command) on a remote server to pull changes from your master repositories every hour. This will only be tricky in the unlikely case that the number of master repositories you maintain changes frequently, in which case you'll need to do a little scripting to refresh the list of repositories to back up.
If you perform traditional backups of your master
repositories to tape or disk, and you want to back up a
repository named myrepo
, use hg
clone -U myrepo myrepo.bak to create a
clone of myrepo
before you start your
backups. The -U
option doesn't check out a
working directory after the clone completes, since that would be
superfluous and make the backup take longer.
If you then back up myrepo.bak
instead
of myrepo
, you will be guaranteed to have a
consistent snapshot of your repository that won't be pushed to
by an insomniac developer in mid-backup.