Skip to content

TCP Congestion Control

TCP’s job is to fill the pipe without breaking the network. If every TCP sender pushed data at maximum rate simultaneously, routers would drop packets as queues overflow. TCP congestion control is the set of algorithms that make TCP self-regulate to avoid this.

The key insight: packet loss is TCP’s signal that the network is overloaded. TCP uses loss (and latency in newer algorithms) as feedback to slow down.

TCP Congestion Control


VariableMeaning
CWND (Congestion Window)How many unacknowledged segments TCP is allowed to have in flight at once
SSTHRESH (Slow Start Threshold)The CWND size where TCP switches from exponential growth to linear growth
RTT (Round Trip Time)How long it takes for a packet + ACK round trip. Informs timing algorithms.
MSS (Maximum Segment Size)The largest data payload per TCP segment (typically 1460 bytes on Ethernet)

Despite the name, slow start is actually exponential growth - it just starts small:

CWND starts at 1 MSS (or 10 MSS in modern Linux, RFC 6928)
Each ACK received → CWND += 1 MSS
After 1 RTT → CWND doubles
RTT 0: CWND = 1 MSS (send 1 segment)
RTT 1: CWND = 2 MSS (send 2 segments)
RTT 2: CWND = 4 MSS (4)
RTT 3: CWND = 8 MSS (8)
...continues until CWND >= SSTHRESH

Once CWND hits SSTHRESH, TCP switches to linear growth (additive increase):

Each ACK received → CWND += 1/CWND (effectively +1 MSS per RTT)
This is "Additive Increase Multiplicative Decrease" (AIMD)
EventWhat TCP doesWhy
Packet loss (timeout)SSTHRESH = CWND/2, CWND = 1 MSS, restart slow startTimeout = severe congestion signal
3 duplicate ACKs (fast retransmit)SSTHRESH = CWND/2, CWND = SSTHRESH (TCP Reno)3 dupACKs = mild congestion, don’t restart from scratch
ECN signal (explicit congestion notification)Same as 3 dupACKs but without packet lossRouter flags packets before dropping them

TCP Lifecycle


The classic algorithm (TCP Reno/Cubic) reacts to loss. When links are fast, you need many seconds of data in flight before seeing loss - and then you slam the brakes. Faster links = worse efficiency with loss-based CC.

AlgorithmSignal UsedBest ForDefault On
TCP RenoPacket lossLow-bandwidth linksLegacy
TCP CubicPacket loss (cubic growth curve)High-bandwidth long-delay linksLinux default until ~2016
TCP BBR (Bottleneck Bandwidth and RTT)Bandwidth + RTT (not loss)High-speed links, long distances, lossy linksGoogle uses it; opt-in on Linux
QUIC (HTTP/3)Custom (UDP-based, per-stream CC)Web, mobile, high packet loss scenariosChrome, Cloudflare, YouTube

Terminal window
# See CWND and other TCP state per connection
ss -tin
# Output includes:
# cwnd:10 ssthresh:7 bytes_acked:1448 rcv_rtt:11.563
# rto:324 rtt:123.456/12.345 ato:40 mss:1448 pmtu:1500
# Watch live CWND changes (requires ss with -e)
watch -n 1 'ss -tin | grep cwnd'
SymptomLikely cause
Slow file transfer on a fast linkCongestion window not opened fully; check RTT and SSTHRESH
Speed is good initially then dropsBuffer overflow causing loss; congestion kicks in
Some paths fast, others slowDifferent congestion levels per path
High CPU during transfersInterrupt coalescing / GRO settings; not CC
YouTube buffers but download is fineDifferent CC behavior for streaming vs. bulk

Why This Matters for Application Developers

Section titled “Why This Matters for Application Developers”
  • Small messages (RPC, API calls) often never leave slow start - size matters
  • HTTP/1.1 keep-alive reuses connections to preserve CWND state (vs. new connection = slow start again)
  • HTTP/2 multiplexing sends multiple streams on one connection - shares CWND efficiently
  • TCP_NODELAY disables Nagle’s algorithm (which buffers small packets) - use for interactive apps (SSH, gaming), not bulk transfers
Terminal window
# Check if TCP_NODELAY is set on a socket
ss -tino | grep nodelay