I/O Monitoring
Why I/O Monitoring Matters
Section titled “Why I/O Monitoring Matters”Disk performance problems are often not isolated. They interact with other subsystems in subtle ways:
- Looks like a memory problem: Slow I/O fills memory buffers; the kernel starts evicting cached data, causing more I/O, triggering more eviction - a vicious cycle. A system appears memory-starved when it is actually I/O-bound.
- Looks like a network problem: Network transfers may stall waiting for local I/O to catch up, making the bottleneck appear to be the network.
- CPU reports “idle” but system feels slow: If the CPU is sitting idle waiting for I/O, it reports as
wa(I/O wait). The system is I/O-bound, not CPU-bound.
A system is considered I/O-bound when the CPU spends significant time idle, waiting for I/O or network buffers to clear.
Rare or non-repeating bottlenecks are especially difficult to debug - real-time monitoring and tracing tools are essential.
Understanding %iowait
Section titled “Understanding %iowait”%iowait (reported by top, vmstat, iostat) is the percentage of time the CPU spent idle specifically because it was waiting for I/O.
- High
%iowait+ low CPU usage = I/O bottleneck - High
%iowait+ high CPU usage = likely a different problem (CPU-bound with I/O) %iowait= 0 and system is still slow = network, application, or algorithm issue
iostat - Block Device Statistics
Section titled “iostat - Block Device Statistics”iostat is the primary workhorse for monitoring I/O device activity. It reports both CPU utilization and per-device I/O statistics.
iostat [OPTIONS] [devices] [interval] [count]
iostat # single snapshotiostat 2 5 # update every 2 seconds, 5 timesiostat -x 2 # extended stats (most useful)iostat -x -k 2 # extended stats in KB/siostat -x -m 2 # extended stats in MB/siostat -x sda nvme0n1 2 # specific devices onlyiostat -p sda 2 # partition-level breakdownLinux 6.4.4-200.fc38.x86_64 (fedora) 27/07/23 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle 4.44 0.00 2.10 0.06 0.00 93.40
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscddm-0 29.41 469.53 331.00 148.89 30633024 21594968 9713760nvme0n1 0.05 0.55 1.01 0.00 35624 65600 0sda 23.09 469.75 331.00 148.89 30647163 21595269 9713760zram0 61.34 82.42 162.96 0.00 5377404 10631948 0Standard Output Fields
Section titled “Standard Output Fields”| Field | Meaning |
|---|---|
tps | I/O transactions per second. Logical requests can be merged into one physical request. |
kB_read/s | Kilobytes read per second |
kB_wrtn/s | Kilobytes written per second |
kB_dscd/s | Kilobytes discarded per second (SSDs: TRIM operations) |
Partitions from the same disk appear as separate entries. If LVM is in use, dm-X (device mapper) entries appear alongside physical devices.
Extended Output (-x) - The Important Columns
Section titled “Extended Output (-x) - The Important Columns”iostat -x 2| Field | Meaning | Warning threshold |
|---|---|---|
r/s | Read requests per second | |
w/s | Write requests per second | |
rkB/s | Kilobytes read per second | |
wkB/s | Kilobytes written per second | |
rrqm/s | Read requests merged per second (sequential reads) | |
wrqm/s | Write requests merged per second | |
r_await | Average wait time for read requests (ms) | >20ms = concern |
w_await | Average wait time for write requests (ms) | >20ms = concern |
aqu-sz | Average queue size (requests waiting) | >1 on single disk = saturated |
%util | Percentage of time the device was busy | >80% = approaching saturation |
A %util of 100% means the device was busy for the entire sample period - it is saturated. High await values combined with high %util confirm a true disk bottleneck.
iotop - Per-Process I/O
Section titled “iotop - Per-Process I/O”iotop shows which processes are consuming disk I/O, updated in real time - like top, but for disk. Most useful when iostat shows high I/O but you need to know which process is responsible.
sudo iotop # requires rootsudo iotop -o # only show processes actively doing I/Osudo iotop -b -n 5 # batch mode (non-interactive), 5 iterationssudo iotop -d 2 # update every 2 secondssudo iotop -p PID # watch specific processTotal DISK READ: 0.00 B/s | Total DISK WRITE: 4.5 MB/sCurrent DISK READ: 0.00 B/s | Current DISK WRITE: 4.5 MB/s TID PRIO USER DISK READ DISK WRITE COMMAND 1234 be/4 root 0.00 B/s 4.5 MB/s mysqld 1 be/4 root 0.00 B/s 0.00 B/s systemd 5678 be/0 root 0.00 B/s 0.00 B/s [kworker/0:0H]PRIO column values:
be- best effort (standard I/O scheduler class)rt- real time (highest I/O priority)idle- only uses I/O when nothing else needs it
iotop Interactive Keys
Section titled “iotop Interactive Keys”| Key | Action |
|---|---|
o | Toggle filtering to only active I/O processes |
p | Toggle showing process names vs thread names |
a | Toggle between current rate and accumulated totals |
q | Quit |
Identifying I/O Offenders
Section titled “Identifying I/O Offenders”When you see high I/O but need to find the source:
# 1. Confirm I/O is the bottleneckiostat -x 2# Look for high %util and r/w_await on a device
# 2. Find which process is causing itsudo iotop -o
# 3. Check for processes with many open fileslsof | wc -l # total open file descriptors system-widelsof -u username # open files for a specific userlsof +D /var/log # all processes with files open under /var/log
# 4. Find deleted files that are still being written to (space leak)lsof | grep deleted
# 5. Check for excessive syncs or fsync calls on a processstrace -p PID -e trace=fsync,fdatasync,sync 2>&1 | head -20I/O Tuning and Scheduling
Section titled “I/O Tuning and Scheduling”The Linux kernel uses I/O schedulers to order and merge disk requests:
# Check current scheduler for a devicecat /sys/block/sda/queue/scheduler
# Change schedulerecho mq-deadline > /sys/block/sda/queue/scheduler| Scheduler | Best for |
|---|---|
mq-deadline | General purpose; default on HDDs |
none (NOOP) | NVMe SSDs and devices with their own queuing |
bfq | Desktop; prioritizes interactive processes |
kyber | Low-latency SSDs |
For SSDs, none or mq-deadline are generally preferred. The hardware queue handles ordering more efficiently than any software scheduler.
Quick I/O Diagnostics Workflow
Section titled “Quick I/O Diagnostics Workflow”# Step 1: Is the system I/O-bound?vmstat 1 5 # watch 'wa' column (iowait)top # check '%wa' in CPU line
# Step 2: Which device is saturated?iostat -x 2 # high %util + high await = that device
# Step 3: Which process is responsible?sudo iotop -o # real-time per-process I/O
# Step 4: What files are being accessed?lsof -p PID # open files for the offending process
# Step 5: What syscalls are being made?strace -p PID -e trace=read,write,open,fsync