Introduction to System Administration

What Is System Administration?

System administration (sysadmin) is the discipline of maintaining reliable computer systems in a multi-user environment. A sysadmin manages and maintains an organization’s IT infrastructure - the combination of hardware, software, network, and services that keep things running.

In a small company, one sysadmin might do everything. In larger organizations, the work splits across specialized roles: network admin, security engineer, cloud ops, etc.

Servers and Clients

A server is a machine (or software) that provides services to other machines. A client is the machine that requests those services.

Concept	Description
Server	Hosts services - web server, email server, file server, SSH server
Client	Requests services - your laptop hitting a web server, Outlook fetching email
Rack server	Standard horizontal form factor for data centers
Blade server	Thin, modular server that slots into a shared chassis
KVM switch	One keyboard, video, and mouse controlling multiple physical servers

A single server can serve many clients simultaneously
A single client can use multiple servers at once
Server hardware is optimized for reliability and throughput - redundant power supplies, ECC RAM, hot-swappable drives

The Cloud

Cloud computing is accessing compute, storage, and services over the internet instead of running them on local hardware.

Model	What You Manage	Example
On-premises	Everything - hardware, OS, apps, networking	Your own data center
IaaS (Infrastructure as a Service)	OS and above - hardware is the provider’s problem	AWS EC2, Azure VMs, GCP Compute
PaaS (Platform as a Service)	Just your code - runtime is managed	Heroku, Azure App Service, Google App Engine
SaaS (Software as a Service)	Nothing - just use it	Gmail, Slack, Salesforce

Core Sysadmin Responsibilities

User & Hardware Provisioning

Sysadmins manage the full lifecycle of users and their equipment:

Hardware lifecycle:

Procurement - purchase or allocate hardware for an employee
Deployment - image the machine, configure hostname, install required software
Maintenance - apply updates, troubleshoot hardware failures
Retirement - securely wipe data, decommission hardware from the fleet

User lifecycle:

Create accounts and provision access to resources (email, file shares, VPN)
Apply principle of least privilege - users get only the access they need
When a user leaves: revoke access, wipe their machine, recycle hardware

Routine Maintenance

Batch updates: Group security patches and deploy them on a regular cycle (often monthly), not one-at-a-time
Coordinate taking services offline, applying updates, and verifying services come back correctly
Stay on top of security patches - delayed patching is one of the most exploited attack vectors

Organizational Policies

Sysadmins define (or enforce) IT policies:

Should users be allowed to install arbitrary software? (Usually no)
Password requirements - minimum length, complexity, rotation
Acceptable use - what can company devices/networks be used for?
Device security - are company phones required to have a passcode?

Document everything in an internal wiki or knowledge base so policies are transparent and accessible.

Change Management

Change management is the process of planning, communicating, and implementing IT changes while minimizing disruption.

Key Elements of a Change Plan

Element	Purpose
Responsible person/team	Who owns this change
Priority	Critical security patch? Low-priority feature?
Description & scope	What changes, what’s affected
Schedule	When - typically off-hours (Friday evening through weekend)
Rollback plan	How to undo if things go wrong
Testing results	What happened when tested in a staging environment
Risk level	Impact if the change fails

Golden Rules

Never test in production - always use a test environment that mirrors production
Document everything - commands executed, output observed, decisions made
Have a rollback plan - know how to revert before you start
Use canaries - deploy to a small subset of servers first, monitor, then expand

Change Advisory Board (CAB)

In large organizations, a CAB reviews and approves proposed changes, assesses risk, and ensures compliance with business goals and regulations.

Vendor Management

Sysadmins work with vendors - hardware suppliers, software providers, service contractors.

Procurement Considerations

Hardware supply, pricing, and volume discounts
Business accounts with vendors like Dell, HP, Apple
Formal approval processes for vendor relationships

Product End of Life (EOL)

Commercial products follow a lifecycle:

Beta → Release & Primary Support → Extended Support → End of Life (EOL)

Phase	Vendor Support
Primary support	Full updates, security patches, driver updates
Extended support	Critical patches only, product being phased out
EOL	No more support, patches, or updates - product becomes a legacy security risk

Troubleshooting at Scale

As a sysadmin, troubleshooting shifts from “one user, one machine” to managing issues across an entire fleet:

Prioritize by impact - how many users are affected? Is it a single workstation or a critical server?
Centralized logging - aggregate logs from all machines to a central platform (Syslog, ELK stack, Splunk) so you can search across the fleet
Ticketing systems - Jira, ServiceNow, Zendesk - to organize, prioritize, and track issues, and document resolutions
Reproduction cases - before fixing, document the exact steps to reproduce the problem, the unexpected result, and the expected result

With Great Power Comes Great Responsibility

Admin rights give you full control - use them carefully:

Only use admin rights when necessary - don’t browse the web in an admin session
Respect user privacy - don’t access personal data without valid justification and process
Think before you execute - destructive commands can’t always be undone
Keep copies of anything that might be lost during changes
Document your actions - what you ran, what happened, what you changed back