Skip to content

Introduction to System Administration

System administration (sysadmin) is the discipline of maintaining reliable computer systems in a multi-user environment. A sysadmin manages and maintains an organization’s IT infrastructure - the combination of hardware, software, network, and services that keep things running.

In a small company, one sysadmin might do everything. In larger organizations, the work splits across specialized roles: network admin, security engineer, cloud ops, etc.


A server is a machine (or software) that provides services to other machines. A client is the machine that requests those services.

ConceptDescription
ServerHosts services - web server, email server, file server, SSH server
ClientRequests services - your laptop hitting a web server, Outlook fetching email
Rack serverStandard horizontal form factor for data centers
Blade serverThin, modular server that slots into a shared chassis
KVM switchOne keyboard, video, and mouse controlling multiple physical servers
  • A single server can serve many clients simultaneously
  • A single client can use multiple servers at once
  • Server hardware is optimized for reliability and throughput - redundant power supplies, ECC RAM, hot-swappable drives

Cloud computing is accessing compute, storage, and services over the internet instead of running them on local hardware.

ModelWhat You ManageExample
On-premisesEverything - hardware, OS, apps, networkingYour own data center
IaaS (Infrastructure as a Service)OS and above - hardware is the provider’s problemAWS EC2, Azure VMs, GCP Compute
PaaS (Platform as a Service)Just your code - runtime is managedHeroku, Azure App Service, Google App Engine
SaaS (Software as a Service)Nothing - just use itGmail, Slack, Salesforce

Sysadmins manage the full lifecycle of users and their equipment:

Hardware lifecycle:

  1. Procurement - purchase or allocate hardware for an employee
  2. Deployment - image the machine, configure hostname, install required software
  3. Maintenance - apply updates, troubleshoot hardware failures
  4. Retirement - securely wipe data, decommission hardware from the fleet

User lifecycle:

  • Create accounts and provision access to resources (email, file shares, VPN)
  • Apply principle of least privilege - users get only the access they need
  • When a user leaves: revoke access, wipe their machine, recycle hardware
  • Batch updates: Group security patches and deploy them on a regular cycle (often monthly), not one-at-a-time
  • Coordinate taking services offline, applying updates, and verifying services come back correctly
  • Stay on top of security patches - delayed patching is one of the most exploited attack vectors

Sysadmins define (or enforce) IT policies:

  • Should users be allowed to install arbitrary software? (Usually no)
  • Password requirements - minimum length, complexity, rotation
  • Acceptable use - what can company devices/networks be used for?
  • Device security - are company phones required to have a passcode?

Document everything in an internal wiki or knowledge base so policies are transparent and accessible.


Change management is the process of planning, communicating, and implementing IT changes while minimizing disruption.

ElementPurpose
Responsible person/teamWho owns this change
PriorityCritical security patch? Low-priority feature?
Description & scopeWhat changes, what’s affected
ScheduleWhen - typically off-hours (Friday evening through weekend)
Rollback planHow to undo if things go wrong
Testing resultsWhat happened when tested in a staging environment
Risk levelImpact if the change fails
  1. Never test in production - always use a test environment that mirrors production
  2. Document everything - commands executed, output observed, decisions made
  3. Have a rollback plan - know how to revert before you start
  4. Use canaries - deploy to a small subset of servers first, monitor, then expand

In large organizations, a CAB reviews and approves proposed changes, assesses risk, and ensures compliance with business goals and regulations.


Sysadmins work with vendors - hardware suppliers, software providers, service contractors.

  • Hardware supply, pricing, and volume discounts
  • Business accounts with vendors like Dell, HP, Apple
  • Formal approval processes for vendor relationships

Commercial products follow a lifecycle:

Beta → Release & Primary Support → Extended Support → End of Life (EOL)
PhaseVendor Support
Primary supportFull updates, security patches, driver updates
Extended supportCritical patches only, product being phased out
EOLNo more support, patches, or updates - product becomes a legacy security risk

As a sysadmin, troubleshooting shifts from “one user, one machine” to managing issues across an entire fleet:

  • Prioritize by impact - how many users are affected? Is it a single workstation or a critical server?
  • Centralized logging - aggregate logs from all machines to a central platform (Syslog, ELK stack, Splunk) so you can search across the fleet
  • Ticketing systems - Jira, ServiceNow, Zendesk - to organize, prioritize, and track issues, and document resolutions
  • Reproduction cases - before fixing, document the exact steps to reproduce the problem, the unexpected result, and the expected result

With Great Power Comes Great Responsibility

Section titled “With Great Power Comes Great Responsibility”

Admin rights give you full control - use them carefully:

  • Only use admin rights when necessary - don’t browse the web in an admin session
  • Respect user privacy - don’t access personal data without valid justification and process
  • Think before you execute - destructive commands can’t always be undone
  • Keep copies of anything that might be lost during changes
  • Document your actions - what you ran, what happened, what you changed back