I/O Monitoring

Why I/O Monitoring Matters

Disk performance problems are often not isolated. They interact with other subsystems in subtle ways:

Looks like a memory problem: Slow I/O fills memory buffers; the kernel starts evicting cached data, causing more I/O, triggering more eviction - a vicious cycle. A system appears memory-starved when it is actually I/O-bound.
Looks like a network problem: Network transfers may stall waiting for local I/O to catch up, making the bottleneck appear to be the network.
CPU reports “idle” but system feels slow: If the CPU is sitting idle waiting for I/O, it reports as wa (I/O wait). The system is I/O-bound, not CPU-bound.

A system is considered I/O-bound when the CPU spends significant time idle, waiting for I/O or network buffers to clear.

Rare or non-repeating bottlenecks are especially difficult to debug - real-time monitoring and tracing tools are essential.

Understanding `%iowait`

%iowait (reported by top, vmstat, iostat) is the percentage of time the CPU spent idle specifically because it was waiting for I/O.

High %iowait + low CPU usage = I/O bottleneck
High %iowait + high CPU usage = likely a different problem (CPU-bound with I/O)
%iowait = 0 and system is still slow = network, application, or algorithm issue

`iostat` - Block Device Statistics

iostat is the primary workhorse for monitoring I/O device activity. It reports both CPU utilization and per-device I/O statistics.

iostat [OPTIONS] [devices] [interval] [count]

iostat                      # single snapshot
iostat 2 5                  # update every 2 seconds, 5 times
iostat -x 2                 # extended stats (most useful)
iostat -x -k 2              # extended stats in KB/s
iostat -x -m 2              # extended stats in MB/s
iostat -x sda nvme0n1 2     # specific devices only
iostat -p sda 2             # partition-level breakdown

Linux 6.4.4-200.fc38.x86_64 (fedora)   27/07/23   _x86_64_   (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.44    0.00    2.10    0.06    0.00   93.40

Device       tps    kB_read/s  kB_wrtn/s  kB_dscd/s  kB_read  kB_wrtn  kB_dscd
dm-0       29.41      469.53     331.00     148.89  30633024 21594968  9713760
nvme0n1     0.05        0.55       1.01       0.00     35624    65600        0
sda        23.09      469.75     331.00     148.89  30647163 21595269  9713760
zram0      61.34       82.42     162.96       0.00   5377404 10631948        0

Standard Output Fields

Field	Meaning
`tps`	I/O transactions per second. Logical requests can be merged into one physical request.
`kB_read/s`	Kilobytes read per second
`kB_wrtn/s`	Kilobytes written per second
`kB_dscd/s`	Kilobytes discarded per second (SSDs: TRIM operations)

Partitions from the same disk appear as separate entries. If LVM is in use, dm-X (device mapper) entries appear alongside physical devices.

Extended Output (`-x`) - The Important Columns

iostat -x 2

Field	Meaning	Warning threshold
`r/s`	Read requests per second
`w/s`	Write requests per second
`rkB/s`	Kilobytes read per second
`wkB/s`	Kilobytes written per second
`rrqm/s`	Read requests merged per second (sequential reads)
`wrqm/s`	Write requests merged per second
`r_await`	Average wait time for read requests (ms)	>20ms = concern
`w_await`	Average wait time for write requests (ms)	>20ms = concern
`aqu-sz`	Average queue size (requests waiting)	>1 on single disk = saturated
`%util`	Percentage of time the device was busy	>80% = approaching saturation

A %util of 100% means the device was busy for the entire sample period - it is saturated. High await values combined with high %util confirm a true disk bottleneck.

`iotop` - Per-Process I/O

iotop shows which processes are consuming disk I/O, updated in real time - like top, but for disk. Most useful when iostat shows high I/O but you need to know which process is responsible.

sudo iotop                  # requires root
sudo iotop -o               # only show processes actively doing I/O
sudo iotop -b -n 5          # batch mode (non-interactive), 5 iterations
sudo iotop -d 2             # update every 2 seconds
sudo iotop -p PID           # watch specific process

Total DISK READ:   0.00 B/s | Total DISK WRITE:   4.5 MB/s
Current DISK READ: 0.00 B/s | Current DISK WRITE: 4.5 MB/s
  TID  PRIO  USER   DISK READ  DISK WRITE  COMMAND
 1234 be/4  root    0.00 B/s   4.5 MB/s   mysqld
    1 be/4  root    0.00 B/s   0.00 B/s   systemd
 5678 be/0  root    0.00 B/s   0.00 B/s   [kworker/0:0H]

PRIO column values:

be - best effort (standard I/O scheduler class)
rt - real time (highest I/O priority)
idle - only uses I/O when nothing else needs it

`iotop` Interactive Keys

Key	Action
`o`	Toggle filtering to only active I/O processes
`p`	Toggle showing process names vs thread names
`a`	Toggle between current rate and accumulated totals
`q`	Quit

Identifying I/O Offenders

When you see high I/O but need to find the source:

# 1. Confirm I/O is the bottleneck
iostat -x 2
# Look for high %util and r/w_await on a device

# 2. Find which process is causing it
sudo iotop -o

# 3. Check for processes with many open files
lsof | wc -l                # total open file descriptors system-wide
lsof -u username            # open files for a specific user
lsof +D /var/log            # all processes with files open under /var/log

# 4. Find deleted files that are still being written to (space leak)
lsof | grep deleted

# 5. Check for excessive syncs or fsync calls on a process
strace -p PID -e trace=fsync,fdatasync,sync 2>&1 | head -20

I/O Tuning and Scheduling

The Linux kernel uses I/O schedulers to order and merge disk requests:

# Check current scheduler for a device
cat /sys/block/sda/queue/scheduler

# Change scheduler
echo mq-deadline > /sys/block/sda/queue/scheduler

Scheduler	Best for
`mq-deadline`	General purpose; default on HDDs
`none` (NOOP)	NVMe SSDs and devices with their own queuing
`bfq`	Desktop; prioritizes interactive processes
`kyber`	Low-latency SSDs

For SSDs, none or mq-deadline are generally preferred. The hardware queue handles ordering more efficiently than any software scheduler.

Quick I/O Diagnostics Workflow

# Step 1: Is the system I/O-bound?
vmstat 1 5                  # watch 'wa' column (iowait)
top                         # check '%wa' in CPU line

# Step 2: Which device is saturated?
iostat -x 2                 # high %util + high await = that device

# Step 3: Which process is responsible?
sudo iotop -o               # real-time per-process I/O

# Step 4: What files are being accessed?
lsof -p PID                 # open files for the offending process

# Step 5: What syscalls are being made?
strace -p PID -e trace=read,write,open,fsync

I/O Monitoring

Why I/O Monitoring Matters

Understanding %iowait

iostat - Block Device Statistics