Memory Management

Memory and I/O Are Intrinsically Linked

Memory tuning in Linux is complex because memory usage and I/O throughput are tightly coupled. In most cases, the majority of memory is being used to cache file contents from disk. This is deliberate - Linux aggressively caches disk data in RAM to avoid re-reading from the slower physical disk.

Consequences:

Changing memory parameters has a large effect on I/O performance
Changing I/O parameters has an equally large effect on the virtual memory subsystem
Optimize one without considering the other and you may make things worse

Viewing Memory Usage

`free` - Memory Summary

free -h                     # human-readable (GiB/MiB)
free -m                     # in megabytes
free -s 2                   # update every 2 seconds

             total        used        free      shared  buff/cache   available
Mem:          7763        3178         646        1022        3938        3262
Swap:         7762        1034        6728

Column	Meaning
`total`	Total installed RAM
`used`	Used by processes
`free`	Completely unused RAM
`shared`	Memory shared between processes (tmpfs, shmem)
`buff/cache`	Used for kernel buffers and file cache (reclaimable)
`available`	Memory available for new processes without swapping (= free + reclaimable cache)

`/proc/meminfo` - Detailed Breakdown

cat /proc/meminfo

MemTotal:        7949804 kB
MemFree:          669748 kB
MemAvailable:    3355456 kB
Buffers:              28 kB
Cached:          3777140 kB
SwapCached:        13160 kB
Active:          2357428 kB
Inactive:        3249488 kB
Active(anon):    1659132 kB    # anonymous (heap/stack) active
Inactive(anon):  1201760 kB
Active(file):     698296 kB    # file-backed active cache
Inactive(file):  2047728 kB
Unevictable:      583624 kB
Mlocked:             220 kB
SwapTotal:       7949308 kB
...

Memory Monitoring Tools

Tool	Purpose	Package
`free`	Brief summary: total, used, free, cache, available	procps
`vmstat`	Detailed memory + swap + I/O + CPU, updateable	procps
`pmap`	Per-process memory map showing segments + sizes	procps

pmap -x PID                 # extended output with RSS per segment
pmap -d PID                 # device format

`vmstat` - Virtual Memory Statistics

vmstat is a multi-purpose tool that reports memory, paging, I/O, CPU, and process information in one view.

vmstat [options] [delay] [count]

vmstat 2 4                  # report every 2 seconds, 4 times
vmstat -SM -a 2 4           # in MB, show active/inactive memory
vmstat -p /dev/sda3 2 4     # per-partition I/O stats

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  0 1048576 910912    28 4061280    6   16    62    42   52  151  4  2 94  0  0

Column key:

Field	Meaning
`r`	Processes waiting for CPU (run queue length)
`b`	Processes in uninterruptible sleep (I/O wait)
`swpd`	Virtual memory used (swap)
`free`	Idle memory
`buff`	Kernel buffer cache
`cache`	Page cache
`si`	Swap-in rate (kB/s from disk to RAM)
`so`	Swap-out rate (kB/s from RAM to disk) - if non-zero, you’re memory-constrained
`bi` / `bo`	Block I/O in/out (sectors/s)
`us/sy/id/wa`	CPU: user/system/idle/I/O wait %

If so (swap-out) is persistently non-zero, the system is under memory pressure. If r consistently exceeds the number of CPU cores, you are CPU-bound.

Active vs Inactive Memory:

With -a, vmstat shows active and inactive memory instead of buff/cache:

Active: recently used pages; may be clean or dirty
Inactive: not recently used; likely clean and released first under pressure

Virtual Memory and Swap

Linux implements a virtual memory system - processes can be given more memory than physically exists. This works in two ways:

Memory overcommit (COW): Many processes never use all requested memory. When a child process is forked, it inherits the parent’s address space via Copy-On-Write (COW) - no physical copy is made until either process modifies a page. This allows fork() to be near-instantaneous even for large processes.
Swapping: When physical RAM is exhausted, the kernel moves less active memory pages from RAM to a swap partition or file on disk. Pages are brought back (swapped in) on-demand when accessed again.

Swap Recommended Size

General guidance: equal to installed RAM. Systems with large RAM may use less swap; hibernation requires at least as much swap as RAM.

Managing Swap

# View current swap usage
cat /proc/swaps
swapon --show               # tabular, with priority

# Runtime free memory check
free -h

# Create and enable a swap file
sudo dd if=/dev/zero of=/swapfile bs=1G count=4  # create 4 GB file
sudo chmod 600 /swapfile
sudo mkswap /swapfile       # format as swap
sudo swapon /swapfile       # enable

# Disable swap (flush back to RAM first)
sudo swapoff /swapfile

# Add to /etc/fstab to persist across reboots:
# /swapfile none swap sw 0 0

Swap Priority

Linux supports multiple swap areas, each with a priority. Lower priority areas are not used until higher priority areas fill:

swapon -p 10 /dev/sdb1      # enable with priority 10

What Gets Cached vs Swapped

At any moment, most memory is used to cache file contents. These file-backed pages never need to be swapped because their backing store is the file on disk. Instead, dirty file-backed pages (modified content not yet written to disk) are flushed to disk rather than swapped.

Only anonymous memory (heap, stack, mmap without a file) is swapped to the swap area when reclaimed.

Tuning the VM Subsystem: `/proc/sys/vm`

The /proc/sys/vm directory exposes kernel VM tuning knobs. You can change them live by writing directly or using sysctl:

ls /proc/sys/vm/            # list available parameters
sysctl vm.swappiness        # read a parameter
sysctl -w vm.swappiness=10  # set temporarily
echo "vm.swappiness=10" >> /etc/sysctl.conf   # persist across reboots

Key Parameters

Parameter	Default	Effect
`vm.swappiness`	60	How aggressively to swap (0 = prefer RAM, 100 = swap heavily). Values 10-20 recommended for desktops.
`vm.dirty_ratio`	20	% of RAM that can hold dirty (unwritten) pages before all I/O is blocked
`vm.dirty_background_ratio`	10	% of RAM with dirty pages before background writeback starts
`vm.overcommit_memory`	0	Overcommit policy (see below)
`vm.vfs_cache_pressure`	100	Tendency to reclaim memory from directory/inode cache

Three primary tuning areas:

Flush behavior: how many dirty pages are allowed and how often they are written to disk
Swap behavior: when to swap anonymous pages vs keep file-backed pages in RAM
Overcommit level: how much memory allocation beyond physical RAM + swap is allowed

OOM (Out of Memory) Killer

When the system exhausts both RAM and swap, it faces a choice:

Refuse allocations - fail malloc() calls; applications crash
Use swap - slower but extends capacity
Overcommit, then use OOM Killer - Linux’s default approach

Linux allows memory overcommitment: granting allocation requests beyond what RAM + swap can hold, because many processes never fully use their allocated memory (COW, buffers never filled). The kernel tracks this and selects a victim to kill only when the overcommit becomes real.

Overcommit Policy: `/proc/sys/vm/overcommit_memory`

Value	Behavior
`0` (default)	Allow overcommit, but refuse obvious overcommits. Root gets more headroom than regular users.
`1`	Allow all memory requests unconditionally. Maximum overcommit.
`2`	Disable overcommit. Fail if total allocations exceed `swap + (overcommit_ratio % of RAM)`.

How the OOM Killer Selects Victims

Each process has an oom_score in /proc/[pid]/oom_score. Higher scores = more likely to be killed.

The score is based on:

How much memory the process uses
How long it has been running
Whether it is a privileged process

cat /proc/$(pgrep firefox)/oom_score   # check a specific process's score

# Protect a critical process from being OOM-killed (-1000 = never kill):
echo -1000 > /proc/PID/oom_score_adj

# Make a process more killable (positive values, up to 1000):
echo 500 > /proc/PID/oom_score_adj

Viewing OOM Events

dmesg -T | grep -i "oom\|killed process"
journalctl -k | grep -i oom