www.mikerubel.org/computers/rsync_snapshots -> www.mikerubel.org/computers/rsync_snapshots/
Frequently Asked Questions Abstract This document describes a method for generating automatic rotating "snapshot"-style backups on a Unix-based system, with specific examples drawn from the author's GNU/Linux experience. Snapshot backups are a feature of some high-end industrial file servers; they create the illusion of multiple, full backups per day without the space or processing overhead. All of the snapshots are read-only, and are accessible directly by users as special system directories. It is often possible to store several hours, days, and even weeks' worth of snapshots with slightly more than 2x storage.
rsync program, which is installed by default on most Linux distributions. Properly configured, the method can also protect against hard disk failure, root compromises, or even back up a network of heterogeneous desktops automatically.
Ever accidentally delete or overwrite a file you were working on? Wouldn't it be nice if there were a /snapshot directory that you could go back to, which had complete images of the file system at semi-hourly intervals all day, then daily snapshots back a few days, and maybe a weekly snapshot too? What if every user could just go into that magical directory and copy deleted or overwritten files back into "reality", from the snapshot of choice, without any help from you? And what if that /snapshot directory were read-only, like a CD-ROM, so that nothing could touch it (except maybe root, but even then not directly)? Best of all, what if you could make all of that happen automatically, using only one extra, slightly-larger, hard disk? In my lab, we have a proprietary NetApp file server which provides that sort of functionality to the end-users. It provides a lot of other things too, but it cost as much as a luxury SUV. It's quite appropriate for our heavy-use research lab, but it would be overkill for a home or small-office environment. But that doesn't mean small-time users have to do without! I'll show you how I configured automatic, rotating snapshots on my $80 used Linux desktop machine (which is also a file, web, and mail server) using only a couple of one-page scripts and a few standard Linux utilities that you probably already have. I'll also propose a related strategy which employs one (or two, for the wisely paranoid) extra low-end machines for a complete, responsible, automated backup strategy that eliminates tapes and manual labor and makes restoring files as easy as "cp". Using rsync to make a backup The rsync utility is a very well-known piece of GPL'd software, written originally by Andrew Tridgell and Paul Mackerras. If you have a common Linux or UNIX variant, then you probably already have it installed;
Rsync's specialty is efficiently synchronizing file trees across a network, but it works fine on a single machine too. Basics Suppose you have a directory called source, and you want to back it up into the directory destination. To accomplish that, you'd use: rsync -a source/ destination/ (Note: I usually also add the -v (verbose) flag too so that rsync tells me what it's doing). destination/ except that it's much more efficient if there are only a few differences. Sometimes This isn't really an article about rsync, but I would like to take a momentary detour to clarify one potentially confusing detail about its use. You may be accustomed to commands that don't care about trailing slashes. For example, if a and b are two directories, then cp -a a b is equivalent to cp -a a/ b/. However, rsync does care about the trailing slash, but only on the source argument. For example, let a and b be two directories, with the file foo initially inside directory a Then this command: rsync -a a b produces b/a/foo, whereas this command: rsync -a a/ b produces b/foo. The presence or absence of a trailing slash on the destination argument (b, in this case) has no effect. Using the --delete flag If a file was originally in both source/ and destination/ (from an earlier rsync, for example), and you delete it from source/, you probably want it to be deleted from destination/ on the next rsync. However, the default behavior is to leave the copy at destination/ in place. Assuming you want rsync to delete any file from destination/ that is not in source/, you'll need to use the --delete flag: rsync -a --delete source/ destination/ Be lazy: use cron One of the toughest obstacles to a good backup strategy is human nature; if there's any work involved, there's a good chance backups won't happen. Fortunately, there's a way to harness human laziness: make cron do the work. To run the rsync-with-backup command from the previous section every morning at 4:20 AM, for example, edit the root cron table: (as root) crontab -e Then add the following line: 20 4 * * * rsync -a --delete source/ destination/ Finally, save the file and exit. The backup will happen every morning at precisely 4:20 AM, and root will receive the output by email. you should use full path names (such as /usr/bin/rsync and /home/source/) to remove any ambiguity. Incremental backups with rsync Since making a full copy of a large filesystem can be a time-consuming and expensive process, it is common to make full backups only once a week or once a month, and store only changes on the other days. These are called "incremental" backups, and are supported by the venerable old dump and tar utilities, along with many others. However, you don't have to use tape as your backup medium; it is both possible and vastly more efficient to perform incremental backups with rsync. The most common way to do this is by using the rsync -b --backup-dir= combination.
here, but I won't discuss it further, because there is a better way. If you're not familiar with hard links, though, you should first start with the following review. Review of hard links We usually think of a file's name as being the file itself, but really the name is a hard link. A given file can have more than one hard link to itself--for example, a directory has at least two hard links: the directory name and . It also has one hard link from each of its sub-directories (the .. If you have the stat utility installed on your machine, you can find out how many hard links a file has (along with a bunch of other information) with the command: stat filename Hard links aren't just for directories--you can create more than one link to a regular file too. For example, if you have the file a, you can make a link called b: ln a b Now, a and b are two names for the same file, as you can verify by seeing that they reside at the same inode (the inode number will be different on your machine): ls -i a 232177 a ls -i b 232177 b So ln a b is roughly equivalent to cp a b, but there are several important differences: 1 The contents of the file are only stored once, so you don't use twice the space. You do this by running cp with the --remove-destination flag. it doesn't really remove a file, it just removes that one link to it. A file's contents aren't truly removed until the number of links to it reaches zero. In a moment, we're going to make use of that fact, but first, here's a word about cp. Using cp -al In the previous section, it was mentioned that hard-linking a file is similar to copying it. It should come as no surprise, then, that the standard GNU coreutils cp command comes with a -l flag that causes it to create (hard) links instead of copies (it doesn't hard-link directories, though, which is good; Another handy switch for the cp command is -a (archive), which causes it to recurse through directories and preserve file owners, timestamps, and access permissions. Together, the combination cp -al makes what appears to be a full copy of a directory tree, but is really just an illusion that takes almost no space. To the end-user, the only differences are that the illusion-copy takes almost no disk space and almost no time to generate. Putting it all together We can combine rsync and cp -al to create what appear to be multiple full backups of a filesystem without taking multiple disks' worth of space. In reality, the extra storage will be equal to the current size of source_directory/ plus the total size of the cha...
|