Skip to content

Backup & Compression

Not all data is equally important. Before planning a backup strategy, categorize what you have:

PriorityData types
Definitely yesBusiness-critical data, system configuration files (/etc), user files (/home)
MaybeSpool directories (printing, mail), log files (/var/log) - needed for history/audit
Probably notAny software that can be easily reinstalled; well-managed systems minimize this
Definitely notPseudo-filesystems: /proc, /dev, /sys (kernel-generated at boot), swap space, /tmp

MethodDescription
FullAll files; slowest to create, fastest to restore
IncrementalFiles changed since the last backup (full or incremental); fast to create, slower to restore
DifferentialFiles changed since the last full backup; middle ground
Multi-level incrementalFiles changed since the previous backup at the same or lower level
UserOnly files within a specific user’s home directory

The simplest scheme: one full backup, then incremental backups of all changes thereafter.

Full backups take more time; restoring from many incremental backups is complex. A common approach: weekly full + daily incremental. This balances backup window time with restore complexity.

Critical reminder: Backup methods are useless without tested restore methods. Test your restores regularly.


ToolUse case
rsyncNetwork or local sync; sends only changed bytes; most versatile
tarArchive creation with optional compression; standard for file packaging
cpBasic local copy; no incremental, no network
ddRaw disk/partition copy; also MBR/GPT backup
cpioAlternative archiver; used by RPM internally
dump / restoreFilesystem-level backup; must restore to same filesystem type
mtTape positioning and control

rsync is the preferred backup tool. Unlike cp, it:

  • Checks if files already exist at the destination
  • Skips files unchanged in size and modification time
  • Transfers only the parts of files that actually changed (binary delta algorithm)
  • Works across the network via SSH
  • Supports resumable transfers
Terminal window
# Basic syntax
rsync sourcefile destinationfile
# Recommended flags: -a (archive), -v (verbose), -z (compress), -P (progress + partial)
rsync -avzP ./local-dir/ user@host:/remote/path/
# Full sync with deletion (mirrors source to destination)
rsync --progress -avrxH --delete sourcedir/ destdir/
# Dry run first (strongly recommended before --delete runs)
rsync --dry-run -avz sourcedir/ destdir/
# Remote backup
rsync -r project-X archive-machine:archives/project-X
# Remote restore
rsync -avzP user@host:/remote/backup/ ./local-restore/

File data is compressed to save disk space and reduce transfer time. Linux compression tools make different speed/ratio tradeoffs:

ToolExtensionSpeedRatioNotes
gzip.gzFastGoodMost common; -9 for max compression
bzip2.bz2SlowBetterDeprecated - use xz instead
xz.xzSlowestBestUsed for Linux kernel archives; well-maintained
zip.zipFastOKFor Windows compatibility; rarely used natively on Linux
Terminal window
gzip * # compress all files in current dir (replaces originals with .gz)
gzip -r projectX # recursively compress all files under projectX/
gzip -9 bigfile # maximum compression
gunzip foo.gz # decompress (same as gzip -d)
gzip -l archive.gz # list compression ratio and sizes
Terminal window
bzip2 * # compress all files (replaces with .bz2)
bunzip2 *.bz2 # decompress (same as bzip2 -d)
Terminal window
xz * # compress all files (replaces with .xz)
xz foo # compress foo → foo.xz (removes original if successful)
xz -dk bar.xz # decompress bar.xz, keep original (-d decompress, -k keep)
xz -d *.xz # decompress all .xz files
xz -dcf a.txt b.txt.xz > combined.txt # decompress mix of compressed/uncompressed
Terminal window
zip backup * # compress all files into backup.zip
zip -r backup.zip ~ # archive home directory recursively
unzip backup.zip # extract all files
unzip -l backup.zip # list contents without extracting

tar (originally tape archive) creates archives (tarballs). It can also compress inline using -z (gzip), -j (bzip2), or -J (xz).

FlagMeaning
cCreate archive
xExtract archive
vVerbose (list files as processed)
fFilename follows
zUse gzip
jUse bzip2
JUse xz
tList contents
Terminal window
tar cvf archive.tar mydir/ # create uncompressed archive
tar zcvf archive.tar.gz mydir/ # create gzip-compressed
tar jcvf archive.tar.bz2 mydir/ # create bzip2-compressed
tar Jcvf archive.tar.xz mydir/ # create xz-compressed (best ratio)
tar xvf archive.tar.gz # extract (tar auto-detects compression)
tar xvf archive.tar.gz -C /target/dir # extract to specific directory
tar tvf archive.tar.gz # list contents without extracting
Terminal window
tar --create --newer '2024-01-01' -vzf incremental.tgz /var/tmp
tar --create --after-date '2024-01-01' -vzf incremental.tgz /var/tmp

Both create an archive of files in /var/tmp modified after January 1, 2024.

Limitation: tar only checks the file’s date for incremental logic. It doesn’t detect other changes like permission or filename changes. Use find to build a file list if you need to include those.


dd copies raw data between devices. It bypasses the filesystem and works directly at the block level.

Terminal window
# Backup MBR (partition table + bootloader - first 512 bytes)
dd if=/dev/sda of=sda.mbr bs=512 count=1
# Restore MBR
sudo dd if=mbrbackup of=/dev/sda bs=512 count=1
# Clone disk to another (DESTRUCTIVE - wipes destination)
dd if=/dev/sda of=/dev/sdb
# Create a disk image
dd if=/dev/sda of=disk.img bs=4M status=progress
  • if= - input file (or device)
  • of= - output file (or device)
  • bs= - block size (larger = faster, up to memory limits)
  • count= - number of blocks to copy

The name dd is often joked to stand for “disk destroyer” - treat it accordingly.

For GPT partition tables, use sgdisk instead:

Terminal window
sudo sgdisk -p /dev/sda # display partition table
sudo sgdisk --backup=gpt.bak /dev/sda # backup GPT
sudo sgdisk --load-backup=gpt.bak /dev/sda # restore

Full-featured backup suites for enterprise use:

ToolDescription
AmandaAdvanced Maryland Automatic Network Disk Archiver; uses tar/dump internally; robust and scriptable
BaculaComplex but powerful heterogeneous network backup; best for experienced admins
ClonezillaDisk imaging and deployment; supports many OS types and filesystems; available in Live and Server editions