Linux Filesystem Backups

Backup is an essential responsibility that comes with owning a computer, but it is more honored in the breech than in practice.

Echoing what I said in MySQL Backups, some situations may require more elaborate techniques, but these scripts are “good enough” for my needs. I hope you find these scripts useful, and welcome comments, critiques, or suggestions for alternative methods. Note: these are just examples. If you use them, you do so at your own risk. You’ll want to adjust them for your own situation.

I run these scripts ad hoc (via the at command), rather than via cron, for a couple reasons. First, these run on my laptop, so it may be down, or the backup media may not be avalable. Second, my personal schedule is very irregular and I prefer to back up when the system is quiescent. I’ve found the most convenient time for incremental backups is while I’m getting ready for work. I do full backups some time over the weekend.

My schedule, though not rigidly adhered to, is to run incrementals every morning. Because the name pattern includes the day of the week, I generally have seven copies available, although not necessarily the most recent seven days. Currently, they contain any files modified within 30 days.

Full backups use a similar naming scheme, which I should fix at some point. I pick one of the weekend backups each month to preserve long term, and rename it to indicate the specific date. I try to keep about a year’s worth of full backups available.

The first file backed up is /backid.txt; it is also included in the emailed logfile. The create-backid-file script (below) dumps several pieces of information useful in recovering a system into it, such as the output of fdisk -l and /etc/fstab. Because I use cpio, this text can be viewed directly from the archive file with less, head, or even cat. That could be handy if you have limited resources to work with during a recovery.

After the backup itself completes, a verify-crc pass is made over the archive. This may not be as assuring as doing an actual file-by-file comparison, but seems a good tradeoff in my situation between reliability and time. An occasional file restore should be done to confirm that what you think is getting backed up really is. I’ve also done bare-metal restores without problems using these archives.

The actual scripts are listed later, but first I’d like to point out a few things from the log file.

  • the backid.txt file is included
  • line counts from the backup pass and the verify pass are included; the vast majority of the time, these are the same
  • the block count from the two passes are included; these should always be the same
  • the size of the archive is calculated
  • the logs are kept locally for review as well as being copied to the backup media
  • timestamps are used at important steps so I can develop a feel for how the scripts perform

Here’s a sample log:

Script:   incr-backup
Started:  Wed Jul  2 06:57:03 CDT 2008

[ ... deleted most of log ... ]

Wed Jul  2 08:14:19 CDT 2008 : testing/listing done

line counts from logs
   96944 /tmp/joule-incr-Wed-v0000.log
   96945 /tmp/joule-incr-Wed-v0000.toc

block counts (size=32768) from logs
==> /tmp/joule-incr-Wed-v0000.log <==
1145922 blocks

==> /tmp/joule-incr-Wed-v0000.toc <==
1145922 blocks

1145922 blocks @ 32768/block = 34.97 GB

Wed Jul  2 08:14:20 CDT 2008 : compressing logs

Wed Jul  2 08:14:22 CDT 2008 : moving logs to /usr/local/data/bcklogs
Wed Jul  2 08:14:22 CDT 2008 : copying logs to /media/MX200702082108

Script:    incr-backup
Started:   Wed Jul  2 06:57:03 CDT 2008
Finished:  Wed Jul  2 08:14:22 CDT 2008
Usage:     34.97 GB; 96945 files/dirs

It's kinda large because of a VMware disk image and a couple of dvd .iso files downloaded but not yet burned.

This is the full-backup script (text). Things to note:

  • the DIRS variable specifies which filesystems to back up; the id file should be first in the list
  • nul-terminated paths (options -print0, -va0) are used to avoid issues with unusual filenames
  • the VOLNBR is just a holdover from tape and could be removed
# @(#) $Id$

MyScript="`basename $0`"
MyHost="`hostname -s`"

# ----------------------------------------------------------------------------


DOW="`date +%a`"
# Sun, Mon, ...



IDFILE="backid.txt"	# really /backid.txt

# ----------------------------------------------------------------------------

# directories to be backed up; command option specifies not to cross filesystems
# work is done after a "cd /", so the "./" prefix is relative to "/"

### just for testing...
### DIRS="$IDFILE ./boot"

DIRS="./$IDFILE ./boot ./ ./home ./data ./usr/local ./usr ./opt ./var"
# omitted: /tmp /media

# ----------------------------------------------------------------------------

echo ""
echo "$BAR"
echo "Script:   $MyScript"
echo "Started:  $MyStart"
echo "$BAR"

echo ""
echo "Contents of /$IDFILE:"
cat /$IDFILE

echo ""
echo "Mounted file systems:"
df -h

# ----------------------------------------------------------------------------

echo ""
echo "`date` : creating backup"
echo "  targets: $DIRS"

# stdout is empty (always?) when using the -O option of cpio
# all content comes from stderr being redirected to stdout

cd / &&
find $DIRS -xdev -depth -print0 |
cpio -o -va0 -H crc -C $BSIZE -O $DSTDIR/$BCK >$WRKDIR/$LOG 2>&1

# ----------------------------------------------------------------------------

echo ""
echo "`date` : testing and listing backup"

# block count is written to stderr, but can't just send stderr to stdout
# because the count appears to be emitted at random within stdout stream

cpio -i -vt -H crc --only-verify-crc -C $BSIZE -I $DSTDIR/$BCK >$WRKDIR/$TOC 2>$WRKDIR/$TOC.err

rm -f $WRKDIR/$TOC.err

echo ""
echo "`date` : testing/listing done"

# ----------------------------------------------------------------------------

echo ""
echo "line counts from logs"
wc -l $WRKDIR/$LOG $WRKDIR/$TOC | head -2

echo ""
echo "block counts (size=$BSIZE) from logs"

echo ""
fdcnt=`wc -l $WRKDIR/$TOC | sed 's/^ *//' | cut -d' ' -f1`
blkcnt=`tail -n 1 $WRKDIR/$TOC | cut -d' ' -f1`
gbcnt=`echo "scale=2; $blkcnt * $BSIZE / 1024 / 1024 / 1024" | bc`
echo "$blkcnt blocks @ $BSIZE/block = $gbcnt GB"

# ----------------------------------------------------------------------------

echo ""
echo "`date` : compressing logs"

echo ""
echo "`date` : moving logs to $ARCDIR"
[ ! -d $ARCDIR ] && mkdir $ARCDIR
chmod u=rw,go= $ARCDIR/$LOG.gz $ARCDIR/$TOC.gz

echo "`date` : copying logs to $DSTDIR"

# ----------------------------------------------------------------------------


echo ""
echo "$BAR"
echo "Script:    $MyScript"
echo "Started:   $MyStart"
echo "Finished:  $MyFinish"
echo "Usage:     $gbcnt GB; $fdcnt files/dirs"
echo "$BAR"

# ----------------------------------------------------------------------------
# end
# ----------------------------------------------------------------------------

The incr-backup script (text) only differs in two lines; next time I edit the scripts I'll move the interval option to a variable.

$ diff full-backup incr-backup
< BTYPE=full
> BTYPE=incr
< find $DIRS -xdev -depth -print0 |
> find $DIRS -xdev -depth -mtime -30 -print0 |

Finally, the create-backid-file (text). I've stripped out the extra noise from the script for display in this post. Select the text link to get the whole thing. The grub.conf (or lilo) should probably be added as well.

uname -a
cat /etc/fstab
df -h
fdisk -l
chkconfig --list
ps -ef

Aside, check out the full-size version of the lead image -- it's high res and quite interesting, along with the Wikipedia article Hard disk drive where I found it.

image: Paul R. Potts, SixHardDriveFormFactors.jpg, Wikimedia Commons

About hornlo

Geek. Curmudgeon
This entry was posted in programming and tagged , , , . Bookmark the permalink.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.