Introduction to System Administration
What Is System Administration?
Section titled “What Is System Administration?”System administration (sysadmin) is the discipline of maintaining reliable computer systems in a multi-user environment. A sysadmin manages and maintains an organization’s IT infrastructure - the combination of hardware, software, network, and services that keep things running.
In a small company, one sysadmin might do everything. In larger organizations, the work splits across specialized roles: network admin, security engineer, cloud ops, etc.
Servers and Clients
Section titled “Servers and Clients”A server is a machine (or software) that provides services to other machines. A client is the machine that requests those services.
| Concept | Description |
|---|---|
| Server | Hosts services - web server, email server, file server, SSH server |
| Client | Requests services - your laptop hitting a web server, Outlook fetching email |
| Rack server | Standard horizontal form factor for data centers |
| Blade server | Thin, modular server that slots into a shared chassis |
| KVM switch | One keyboard, video, and mouse controlling multiple physical servers |
- A single server can serve many clients simultaneously
- A single client can use multiple servers at once
- Server hardware is optimized for reliability and throughput - redundant power supplies, ECC RAM, hot-swappable drives
The Cloud
Section titled “The Cloud”Cloud computing is accessing compute, storage, and services over the internet instead of running them on local hardware.
| Model | What You Manage | Example |
|---|---|---|
| On-premises | Everything - hardware, OS, apps, networking | Your own data center |
| IaaS (Infrastructure as a Service) | OS and above - hardware is the provider’s problem | AWS EC2, Azure VMs, GCP Compute |
| PaaS (Platform as a Service) | Just your code - runtime is managed | Heroku, Azure App Service, Google App Engine |
| SaaS (Software as a Service) | Nothing - just use it | Gmail, Slack, Salesforce |
Core Sysadmin Responsibilities
Section titled “Core Sysadmin Responsibilities”User & Hardware Provisioning
Section titled “User & Hardware Provisioning”Sysadmins manage the full lifecycle of users and their equipment:
Hardware lifecycle:
- Procurement - purchase or allocate hardware for an employee
- Deployment - image the machine, configure hostname, install required software
- Maintenance - apply updates, troubleshoot hardware failures
- Retirement - securely wipe data, decommission hardware from the fleet
User lifecycle:
- Create accounts and provision access to resources (email, file shares, VPN)
- Apply principle of least privilege - users get only the access they need
- When a user leaves: revoke access, wipe their machine, recycle hardware
Routine Maintenance
Section titled “Routine Maintenance”- Batch updates: Group security patches and deploy them on a regular cycle (often monthly), not one-at-a-time
- Coordinate taking services offline, applying updates, and verifying services come back correctly
- Stay on top of security patches - delayed patching is one of the most exploited attack vectors
Organizational Policies
Section titled “Organizational Policies”Sysadmins define (or enforce) IT policies:
- Should users be allowed to install arbitrary software? (Usually no)
- Password requirements - minimum length, complexity, rotation
- Acceptable use - what can company devices/networks be used for?
- Device security - are company phones required to have a passcode?
Document everything in an internal wiki or knowledge base so policies are transparent and accessible.
Change Management
Section titled “Change Management”Change management is the process of planning, communicating, and implementing IT changes while minimizing disruption.
Key Elements of a Change Plan
Section titled “Key Elements of a Change Plan”| Element | Purpose |
|---|---|
| Responsible person/team | Who owns this change |
| Priority | Critical security patch? Low-priority feature? |
| Description & scope | What changes, what’s affected |
| Schedule | When - typically off-hours (Friday evening through weekend) |
| Rollback plan | How to undo if things go wrong |
| Testing results | What happened when tested in a staging environment |
| Risk level | Impact if the change fails |
Golden Rules
Section titled “Golden Rules”- Never test in production - always use a test environment that mirrors production
- Document everything - commands executed, output observed, decisions made
- Have a rollback plan - know how to revert before you start
- Use canaries - deploy to a small subset of servers first, monitor, then expand
Change Advisory Board (CAB)
Section titled “Change Advisory Board (CAB)”In large organizations, a CAB reviews and approves proposed changes, assesses risk, and ensures compliance with business goals and regulations.
Vendor Management
Section titled “Vendor Management”Sysadmins work with vendors - hardware suppliers, software providers, service contractors.
Procurement Considerations
Section titled “Procurement Considerations”- Hardware supply, pricing, and volume discounts
- Business accounts with vendors like Dell, HP, Apple
- Formal approval processes for vendor relationships
Product End of Life (EOL)
Section titled “Product End of Life (EOL)”Commercial products follow a lifecycle:
Beta → Release & Primary Support → Extended Support → End of Life (EOL)| Phase | Vendor Support |
|---|---|
| Primary support | Full updates, security patches, driver updates |
| Extended support | Critical patches only, product being phased out |
| EOL | No more support, patches, or updates - product becomes a legacy security risk |
Troubleshooting at Scale
Section titled “Troubleshooting at Scale”As a sysadmin, troubleshooting shifts from “one user, one machine” to managing issues across an entire fleet:
- Prioritize by impact - how many users are affected? Is it a single workstation or a critical server?
- Centralized logging - aggregate logs from all machines to a central platform (Syslog, ELK stack, Splunk) so you can search across the fleet
- Ticketing systems - Jira, ServiceNow, Zendesk - to organize, prioritize, and track issues, and document resolutions
- Reproduction cases - before fixing, document the exact steps to reproduce the problem, the unexpected result, and the expected result
With Great Power Comes Great Responsibility
Section titled “With Great Power Comes Great Responsibility”Admin rights give you full control - use them carefully:
- Only use admin rights when necessary - don’t browse the web in an admin session
- Respect user privacy - don’t access personal data without valid justification and process
- Think before you execute - destructive commands can’t always be undone
- Keep copies of anything that might be lost during changes
- Document your actions - what you ran, what happened, what you changed back