« My first IPhone App - NIOSH Chemical HazardsBackup to USB drive under Ubuntu - NTFS vs. EXT3 »

RSnapshot Linux Backup Tool

06/10/09

Permalink 05:42:24 pm by guy, Categories: Linux, NAS , Tags: backup, nas, rsnapshot

I blogged the other day about having setup a Xubuntu/Ubuntu machine to backup my FreeNAS based NAS server. Probably my favorite new tool is RSnapshot. RSnapshot is a script that uses rsync to create snapshot based backups. Snapshot backups are a backup technique where you first generate a full backup of a data set and then subsequent backup sets are generated against a copy of the original files that consists of hard links to the original files. If you don’t understand hard links then this can be a bit difficult to understand, but the effect is very cool. I’m not sure I can explain it properly, but I’ll try.

First you do a full backup. Let’s assume we are only doing daily backups and we want 7 days available at all time. Once you have configured rsnapshot.conf with your sources and destinations you kick it off with a command of ‘rsnapshot daily‘. The first backup will be ‘daily.0′. This backup is the one that takes hours, where all the files are physically copied from your source to the backup location. Let’s say we start with 100gig of data, this initial backup might take several hours depending on if it is a local or network connection.

Now, the next ‘rsnapshot daily‘ is executed to backup the same data. When it executes makes a hard link copy of the entire ‘daily.0′ backup to ‘daily.1′. Here is the command on my machine that it executes:
/bin/cp -al /mnt/Backup1/daily.0 /mnt/Backup1/daily.1
Because it is a hard link backup there are no actual files copied, but if you browse both directory trees you will find the identical files exist in both locations. The truth is that they are literally the SAME files. Both directory trees point to the same disk locations so there is almost no additional disk space used. You still have only used ~100gig of disk space. Next the script executes rsync to do the next backup of the source data. This is where it gets cool. Since it is executed against a hard link copy (daily.0) of the data and rsync is only going to copy changed files and removed deleted files. What happens is that when rsync deletes or modifies a file in ‘daily.0′ it only breaks the hard link copy and so the file still exists in it’s original form in ‘daily.1′. New files are also added, but there are no existing links to break for those files.

The effect of this is that only new or changed files will actually be copied and occupy additional disk space. Deleted files will be deleted from ‘daily.0′ but still exist in any previous backup, so no actual disk space is immediately freed up since the file still exists in another snapshot. If you had 100meg of new files and 100meg of deleted files then you will only increase your backup space by about 100meg.

So, now again the next day we execute ‘rsnapshot daily‘ and the process repeats in roughly the same manner each day until you reach the configured number of snapshots to retain. Let’s say 7 days. On the 7th day ‘daily.6′ will get deleted. Only when a full snapshot is deleted will you ever have the potential to decrease disk usage. As an example, say you backup a 100meg file on day one and the next day you delete it before the next backup. That 100meg file will continute to exist until 7 days later when the last snapshot containing a reference to it gets deleted.

To quantify some of the benefits of this strategy lets look at some log files and execute a disk usage command.

Here is the log from executing my 4th rsnapshot backup:

[07/Jun/2009:03:30:02] /usr/bin/rsnapshot daily: started
[07/Jun/2009:03:30:02] echo 11794 > /var/run/rsnapshot.pid
[07/Jun/2009:03:30:10] mv /mnt/Backup1/daily.2/ /mnt/Backup1/daily.3/
[07/Jun/2009:03:30:10] mv /mnt/Backup1/daily.1/ /mnt/Backup1/daily.2/
[07/Jun/2009:03:30:10] /bin/cp -al /mnt/Backup1/daily.0 /mnt/Backup1/daily.1
[07/Jun/2009:03:30:43] /usr/bin/rsync -av –delete –numeric-ids –relative –delete-excluded /mnt/FreeNAS /mnt/Backup1/daily.0/FreeNAS/
[07/Jun/2009:03:34:18] touch /mnt/Backup1/daily.0/
[07/Jun/2009:03:34:18] rm -f /var/run/rsnapshot.pid
[07/Jun/2009:03:34:18] /usr/bin/logger -i -p user.info -t rsnapshot /usr/bin/rsnapshot daily: completed successfully

You can see that the entire backup executed in only 4min and 16seconds even though if you review daily.0/FreeNAS you would find what looks like a full backup. Browsing to daily.1/FreeNAS would contain what looks like a full backup from the previous day. If you wanted to recover a file that you had deleted you could simply browse to the appropriate daily.X directory and simply copy it, it is that easy.

Now, if you execute the command ‘rsnapshot du’ it will enumerate all your backups and how much actual disk space each one contains. The first one, daily.0 will always appear to be the one that contains the full backup and the rest of the files will contain the differences. Here is the output from that command on my system:

rsnapshot du
require Lchown
Lchown module loaded successfully
/usr/bin/du -csh /mnt/Backup1/daily.0/ /mnt/Backup1/daily.1/ \
/mnt/Backup1/daily.2/ /mnt/Backup1/daily.3/ /mnt/Backup1/daily.4/ \
/mnt/Backup1/daily.5/ /mnt/Backup1/daily.6/

47G /mnt/Backup1/daily.0/
117M /mnt/Backup1/daily.1/
143M /mnt/Backup1/daily.2/
146M /mnt/Backup1/daily.3/
97M /mnt/Backup1/daily.4/
321M /mnt/Backup1/daily.5/
140M /mnt/Backup1/daily.6/
48G total

Very cool. I love this linux stuff, so much to learn and explore. I won’t leave my Windows desktop, but Linux is certainly a powerful tool for those willing and able to learn.

I didn’t even get into it, but by default rsnapshot can also do hourly, daily, weekly, and monthly snaps to augment daily. It can also backup multiple sources in each execution. All weekly and monthly backups are done against the most recent daily (or hourly if you have hourly configured). There are many configuration options. The only drawback for someone like me from the Windows world is that there is no GUI, it is fully configured from a text configuration file. This isn’t bad, but it can be intimidating.

November 2017
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    
I'm a generalist, at least if I'm honest. In my job I am primarily a developer, but also a sysadmin, and (as little as possible) technical support. I know a little about a lot of things, a lot about some things, and everything about nothing. Here I will post random learnings...

Search

XML Feeds

User tools

powered by b2evolution