Backup & Compression
What Needs to Be Backed Up?
Section titled “What Needs to Be Backed Up?”Not all data is equally important. Before planning a backup strategy, categorize what you have:
| Priority | Data types |
|---|---|
| Definitely yes | Business-critical data, system configuration files (/etc), user files (/home) |
| Maybe | Spool directories (printing, mail), log files (/var/log) - needed for history/audit |
| Probably not | Any software that can be easily reinstalled; well-managed systems minimize this |
| Definitely not | Pseudo-filesystems: /proc, /dev, /sys (kernel-generated at boot), swap space, /tmp |
Backup Methods
Section titled “Backup Methods”| Method | Description |
|---|---|
| Full | All files; slowest to create, fastest to restore |
| Incremental | Files changed since the last backup (full or incremental); fast to create, slower to restore |
| Differential | Files changed since the last full backup; middle ground |
| Multi-level incremental | Files changed since the previous backup at the same or lower level |
| User | Only files within a specific user’s home directory |
Backup Strategy
Section titled “Backup Strategy”The simplest scheme: one full backup, then incremental backups of all changes thereafter.
Full backups take more time; restoring from many incremental backups is complex. A common approach: weekly full + daily incremental. This balances backup window time with restore complexity.
Critical reminder: Backup methods are useless without tested restore methods. Test your restores regularly.
Backup Utilities Overview
Section titled “Backup Utilities Overview”| Tool | Use case |
|---|---|
rsync | Network or local sync; sends only changed bytes; most versatile |
tar | Archive creation with optional compression; standard for file packaging |
cp | Basic local copy; no incremental, no network |
dd | Raw disk/partition copy; also MBR/GPT backup |
cpio | Alternative archiver; used by RPM internally |
dump / restore | Filesystem-level backup; must restore to same filesystem type |
mt | Tape positioning and control |
rsync - Smart Synchronization
Section titled “rsync - Smart Synchronization”rsync is the preferred backup tool. Unlike cp, it:
- Checks if files already exist at the destination
- Skips files unchanged in size and modification time
- Transfers only the parts of files that actually changed (binary delta algorithm)
- Works across the network via SSH
- Supports resumable transfers
# Basic syntaxrsync sourcefile destinationfile
# Recommended flags: -a (archive), -v (verbose), -z (compress), -P (progress + partial)rsync -avzP ./local-dir/ user@host:/remote/path/
# Full sync with deletion (mirrors source to destination)rsync --progress -avrxH --delete sourcedir/ destdir/
# Dry run first (strongly recommended before --delete runs)rsync --dry-run -avz sourcedir/ destdir/
# Remote backuprsync -r project-X archive-machine:archives/project-X
# Remote restorersync -avzP user@host:/remote/backup/ ./local-restore/Compression Utilities
Section titled “Compression Utilities”File data is compressed to save disk space and reduce transfer time. Linux compression tools make different speed/ratio tradeoffs:
| Tool | Extension | Speed | Ratio | Notes |
|---|---|---|---|---|
gzip | .gz | Fast | Good | Most common; -9 for max compression |
bzip2 | .bz2 | Slow | Better | Deprecated - use xz instead |
xz | .xz | Slowest | Best | Used for Linux kernel archives; well-maintained |
zip | .zip | Fast | OK | For Windows compatibility; rarely used natively on Linux |
gzip * # compress all files in current dir (replaces originals with .gz)gzip -r projectX # recursively compress all files under projectX/gzip -9 bigfile # maximum compressiongunzip foo.gz # decompress (same as gzip -d)gzip -l archive.gz # list compression ratio and sizesbzip2 * # compress all files (replaces with .bz2)bunzip2 *.bz2 # decompress (same as bzip2 -d)xz * # compress all files (replaces with .xz)xz foo # compress foo → foo.xz (removes original if successful)xz -dk bar.xz # decompress bar.xz, keep original (-d decompress, -k keep)xz -d *.xz # decompress all .xz filesxz -dcf a.txt b.txt.xz > combined.txt # decompress mix of compressed/uncompressedzip backup * # compress all files into backup.zipzip -r backup.zip ~ # archive home directory recursivelyunzip backup.zip # extract all filesunzip -l backup.zip # list contents without extractingtar - Archive and Compress
Section titled “tar - Archive and Compress”tar (originally tape archive) creates archives (tarballs). It can also compress inline using -z (gzip), -j (bzip2), or -J (xz).
Common Options
Section titled “Common Options”| Flag | Meaning |
|---|---|
c | Create archive |
x | Extract archive |
v | Verbose (list files as processed) |
f | Filename follows |
z | Use gzip |
j | Use bzip2 |
J | Use xz |
t | List contents |
Common Commands
Section titled “Common Commands”tar cvf archive.tar mydir/ # create uncompressed archivetar zcvf archive.tar.gz mydir/ # create gzip-compressedtar jcvf archive.tar.bz2 mydir/ # create bzip2-compressedtar Jcvf archive.tar.xz mydir/ # create xz-compressed (best ratio)
tar xvf archive.tar.gz # extract (tar auto-detects compression)tar xvf archive.tar.gz -C /target/dir # extract to specific directorytar tvf archive.tar.gz # list contents without extractingIncremental Backups with tar
Section titled “Incremental Backups with tar”tar --create --newer '2024-01-01' -vzf incremental.tgz /var/tmptar --create --after-date '2024-01-01' -vzf incremental.tgz /var/tmpBoth create an archive of files in /var/tmp modified after January 1, 2024.
Limitation: tar only checks the file’s date for incremental logic. It doesn’t detect other changes like permission or filename changes. Use find to build a file list if you need to include those.
dd - Raw Disk Copy
Section titled “dd - Raw Disk Copy”dd copies raw data between devices. It bypasses the filesystem and works directly at the block level.
# Backup MBR (partition table + bootloader - first 512 bytes)dd if=/dev/sda of=sda.mbr bs=512 count=1
# Restore MBRsudo dd if=mbrbackup of=/dev/sda bs=512 count=1
# Clone disk to another (DESTRUCTIVE - wipes destination)dd if=/dev/sda of=/dev/sdb
# Create a disk imagedd if=/dev/sda of=disk.img bs=4M status=progressif=- input file (or device)of=- output file (or device)bs=- block size (larger = faster, up to memory limits)count=- number of blocks to copy
The name dd is often joked to stand for “disk destroyer” - treat it accordingly.
GPT Disks
Section titled “GPT Disks”For GPT partition tables, use sgdisk instead:
sudo sgdisk -p /dev/sda # display partition tablesudo sgdisk --backup=gpt.bak /dev/sda # backup GPTsudo sgdisk --load-backup=gpt.bak /dev/sda # restoreBackup Programs
Section titled “Backup Programs”Full-featured backup suites for enterprise use:
| Tool | Description |
|---|---|
| Amanda | Advanced Maryland Automatic Network Disk Archiver; uses tar/dump internally; robust and scriptable |
| Bacula | Complex but powerful heterogeneous network backup; best for experienced admins |
| Clonezilla | Disk imaging and deployment; supports many OS types and filesystems; available in Live and Server editions |