Servers & Environments
Infrastructure as Code covers two tightly related concerns: how you build and manage the servers that run your workloads, and how you organise those workloads into environments. This page covers both - starting with the server lifecycle, then zooming out to environment design.
Servers as Code
Section titled “Servers as Code”The first generation of IaC tools - Ansible, CFEngine, Chef, Puppet, Salt - were built specifically for server configuration. They organise code into modules (playbooks in Ansible, cookbooks in Chef) and apply them to servers through server roles.
Server Composition
Section titled “Server Composition”Everything on a server falls into one of three categories:
| Category | What it contains | How IaC treats it |
|---|---|---|
| Software | Applications, libraries, static code | Installs and versions it; treats the contents as an opaque box |
| Configuration | Files controlling how software behaves | Manages content directly; varies across roles and environments |
| Data | Logs, database files, user-generated content | May backup or distribute, but treats contents as a black box |
The distinction between configuration and data is whether the automation tool actively manages the file’s contents. A system log is vital but treated as data - the automation doesn’t write into it.
Where server content comes from:
- Base OS from a physical disk, ISO, or IaaS stock image
- OS packages from vendor, third-party, or internal repositories
- Language/framework packages (pip, gem, npm, Maven)
- Nonstandard packages via custom installers
- Separate material: firewall rules, user accounts, local overrides
Server Roles
Section titled “Server Roles”A server role is the entry point for managing a server - it defines which configuration modules to apply and sets default parameter values for them. Roles are used when creating servers manually, defining them in a stack, or configuring auto-scaling groups.
Three common strategies for structuring roles:
| Strategy | Description | Example |
|---|---|---|
| Fine-grained | Multiple narrow roles composed together | ApplicationServer + MonitoredServer + PublicFacingServer |
| Higher-level | One comprehensive role per server type | ShoppingServiceApplicationServer |
| Base + inheritance | Universal base role extended by specialised roles | BaseServer → ContainerHostServer |
The base role approach is most common: it encodes organisation-wide requirements (monitoring agents, admin accounts, network hardening) that all servers must inherit.
Server Lifecycle
Section titled “Server Lifecycle”Every server moves through four phases:
flowchart LR
A["1 · Build image\n(optional)"] --> B["2 · Create instance"]
B --> C["3 · Change instance"]
C --> D["4 · Destroy instance"]
Phase 1 - Build a Server Image
Section titled “Phase 1 - Build a Server Image”Platform stock images (AMIs, Azure managed images, GCP VM templates) are a valid starting point, but teams often build custom images to:
- Pre-install monitoring agents, admin accounts, and standard configs
- Make server creation faster (no runtime package installs)
- Harden security by stripping unnecessary software and accounts
- Produce role-specific images for container nodes, CI agents, app servers
Image building methods:
| Method | Notes |
|---|---|
| Modify a stock image | Most common - boot stock image, configure, save as new image |
| Boot an OS installer | Maximum control; avoids relying on third-party stock images |
| Offline image building | Mount image as a disk volume, configure, unmount - faster but needs more tooling |
| Hot-clone a running server | ❌ Never recommended - inherits data pollution (old logs), produces inconsistent images |
Tooling: HashiCorp Packer is the most popular orchestration tool. For images built from a fresh OS, simple sequential shell scripts (e.g., 10-install-monitoring-agent.sh) are often more appropriate than full configuration management tools, which are better suited to managing variable starting conditions on live servers.
Phase 2 - Create an Instance
Section titled “Phase 2 - Create an Instance”Server creation involves four core steps regardless of the tool: selecting a host, allocating resources (CPU, memory, storage, networking), installing the base image, and applying configuration.
Creation triggers:
| Trigger | How |
|---|---|
| Network provisioning | PXE boot → downloads image → reboots → applies config (Cobbler, Foreman, MAAS, RackN) |
| Infrastructure stack | Defined as a resource in Terraform/Pulumi; provisioned via IaaS API |
| Auto-scaling | Platform spawns instances in response to load metrics |
| Auto-recovery | Platform replaces instances that fail health checks |
Phase 3 - Change an Instance
Section titled “Phase 3 - Change an Instance”Three strategies exist for managing updates to running servers:
Push on Change (antipattern)
Section titled “Push on Change (antipattern)”Configuration code is applied only when a specific change is needed.
Problem: Newer servers get the latest patches; older ones don’t. The result is a fleet of snowflake servers - each with a subtly different history, each a liability during incident response.
Continuous Configuration Synchronization
Section titled “Continuous Configuration Synchronization”Configuration code is applied repeatedly on a schedule, even if the code hasn’t changed. Any manual change made to a server (by an engineer or an attacker) is automatically reverted on the next cycle.
- Push implementation: A central service connects to each server via SSH to apply updates - requires all servers to be registered and network-accessible
- Pull implementation (more popular): An agent on the server runs on a
cronschedule, checks a central repo for the latest code version, and pulls it down
Changing by Replacement - Immutable Servers
Section titled “Changing by Replacement - Immutable Servers”The most reliable approach: never modify a running server. Instead, build a new image, provision new instances from it, validate them, redirect traffic, and destroy the old instances.
Replacement sequence:
- Create a new instance without putting it into service
- Run validation checks to confirm it is ready
- Switch services to the new instance
- Verify the new instance is handling workload correctly
- Destroy the old instance
The Immutable Server pattern makes replacement the only mechanism for change. Every update goes through a delivery pipeline before it reaches production - no exceptions.
Phase 4 - Destroy an Instance
Section titled “Phase 4 - Destroy an Instance”Under the Immutable Server pattern, destruction happens last - only after the replacement is live and verified. This guarantees zero downtime during the transition.
Baking Images vs. Frying Instances
Section titled “Baking Images vs. Frying Instances”Baking and frying describe when configuration is applied to a server.
| Baking | Frying | |
|---|---|---|
| When applied | Before the server is ever started | At instance creation time |
| Optimises for | Speed of boot and consistency | Variability and fast config changes |
| Best for | Auto-scaling, auto-recovery, container nodes | On-demand customised workloads |
| Drawback | Slow path to deploying config changes | Slower boot time (installs on the fly) |
flowchart LR
A["Config change needed"] --> B{Strategy?}
B -- Baking --> C["Build new image\n→ test pipeline\n→ replace instances"]
B -- Frying --> D["Update config code\n→ provision new instances\nwith new config"]
The Hybrid Approach (recommended)
Section titled “The Hybrid Approach (recommended)”In practice most teams combine both: bake large, slow-to-install dependencies into the base image; fry instance-specific parameters at creation time.
Base image (baked): - JDK / application server - Container cluster agent - Monitoring agent
Creation-time scripts (fried): - Environment name - Application version - Feature flagsThis gives you the fast boot time of baked images without sacrificing flexibility for customisation.
Pull vs. Push Configuration
Section titled “Pull vs. Push Configuration”When applying configuration to a new or existing server, two architectures exist for how the code gets there.
Pull Configuration (preferred for security)
Section titled “Pull Configuration (preferred for security)”The server configures itself from within using initialization scripts:
| Cloud | Mechanism |
|---|---|
| AWS | User data |
| Azure | Custom data |
| GCP | Startup scripts |
All three leverage the cloud-init standard preinstalled on most Linux images. On first boot, the script passes a role name and environment parameters to a preinstalled configuration agent (Chef, Puppet, Ansible), which downloads and applies the relevant modules.
For ongoing updates, a background agent or cron job periodically pulls the latest code from a central repository.
Security advantage: The server never needs external inbound network access. In high-security environments, SSH doesn’t need to be running at all.
Push Configuration
Section titled “Push Configuration”A central service connects to the server over the network (typically SSH) and executes configuration commands.
Advantage: No configuration agent needs to be preinstalled on the server image.
Risks:
- Grants a central service root access over the network - if that service is compromised, every registered server is compromised
- Requires diligent tracking to ensure every server is registered; unregistered servers silently miss all updates
Multi-Environment Architecture
Section titled “Multi-Environment Architecture”An environment is a logical grouping of deployed infrastructure providing the resources, platform services, and controls needed to run a specific set of workloads. Multi-environment architecture falls into three categories:
flowchart TD
A["Multi-environment needs"] --> B["Delivery environments\n(path to production)"]
A --> C["Split environments\n(manageability & ownership)"]
A --> D["Replica environments\n(scale, geography, user bases)"]
Complex systems combine all three - product groups may have their own delivery pipelines feeding into separate production replicas.
Delivery Environments
Section titled “Delivery Environments”Changes to software, infrastructure, or configuration move through a series of delivery environments before reaching production - the path to production. Environments earlier in the flow are upstream; production is downstream.
Three Tensions to Balance
Section titled “Three Tensions to Balance”| Concern | What it means |
|---|---|
| Segregation | Environments must not interfere with each other; upstream testing must never affect downstream data |
| Consistency | Differences across stages invalidate tests and complicate deployments - this is a primary driver for adopting IaC |
| Variation | Some differences are unavoidable: scaling capacity, access levels, resource IDs, naming conventions |
Delivery Patterns
Section titled “Delivery Patterns”Separate delivery environments: Each distinct production workload gets its own dev/test pipeline. Required when production systems are fundamentally different from each other.
Fan-out delivery: A single shared dev/test pipeline validates changes, then deploys them simultaneously to multiple identical production environments (e.g., the same storefront deployed to multiple regions).
Environment Ownership Warning
Section titled “Environment Ownership Warning”Splitting Environments
Section titled “Splitting Environments”When systems grow too large for a single environment, they are split along three dimensions:
1 · System Architecture Alignment
Section titled “1 · System Architecture Alignment”Sharing an environment creates coupling - the more workloads share an environment, the more coordination is required for changes to shared infrastructure. Split along service boundaries to keep coupling low.
- Shared-nothing systems (two distinct brands’ storefronts) can live in completely separate environments
- Integrated systems (storefronts + shared data service) can still be split into cohesive individual environments as long as integration is loose enough to allow independent changes
2 · Organisational Alignment
Section titled “2 · Organisational Alignment”Teams tend to own environments. A new team for a new service naturally leads to a new environment. A shared platform team naturally leads to shared environments.
This is Conway’s Law applied to infrastructure: the environment structure mirrors the org structure. Be deliberate about whether that is the outcome you want.
3 · Governance Alignment
Section titled “3 · Governance Alignment”Separate environments make compliance easier to enforce and audit:
- Blast radius: A compromised application environment cannot reach backend systems if they are in a separate environment
- Log integrity: Security monitoring services in their own isolated environment cannot be tampered with by attackers who compromise the application tier
- Pipeline safety: Delivery pipeline infrastructure in its own environment is protected from damage caused by the workloads it deploys
Replica Environments
Section titled “Replica Environments”Replica environments run the same software as a canonical production environment but serve distinct user bases, geographic regions, or availability zones.
Why Replicate
Section titled “Why Replicate”| Driver | Details |
|---|---|
| Availability | Traffic can be rerouted to a replica if one region fails; replicas provide independent redundancy units |
| Scalability | Add replicas to absorb traffic that a single environment cannot handle |
| Geographic latency | Replicas closer to users reduce round-trip times |
| Regulatory compliance | Regional replicas provide hard data residency boundaries, simplifying audit |
| Multiple user bases | White-label platforms can isolate each customer’s data in a dedicated replica |
Single-Tenant vs. Multi-Tenant
Section titled “Single-Tenant vs. Multi-Tenant”| Approach | Infrastructure | Trade-off |
|---|---|---|
| Single-tenant replicas | Separate environment per customer/brand | Strong data isolation; high maintenance cost at scale |
| Multi-tenant | One shared environment, software separates tenants | Efficient resource use; requires sophisticated application-level isolation |
Environment Layers and IaaS Resource Groups
Section titled “Environment Layers and IaaS Resource Groups”Three Abstraction Layers
Section titled “Three Abstraction Layers”Environments can be implemented at three levels of abstraction:
| Layer | Shared resources | Isolation boundary |
|---|---|---|
| Physical | Data centre facilities only | Dedicated hardware per environment |
| Virtual | Physical hardware via IaaS | Dedicated virtual resources per environment |
| Configuration | Shared container cluster or serverless platform | Namespaces and config settings |
Configuration environments (namespaces on a shared cluster) are only viable for cloud-native containerised or serverless workloads. Even then, they carry risks:
- Namespace-level separation often fails regulatory requirements that demand hard segregation
- Conflicting workload profiles (low-latency vs. heavy analytics) compete for shared cluster resources
- Upgrading the shared runtime impacts all hosted environments simultaneously, forcing coordinated change windows across every team
- Cluster core service failure takes down all hosted environments - true availability requires independent clusters
IaaS Resource Groups
Section titled “IaaS Resource Groups”Every cloud platform provides a base-level grouping primitive:
| Cloud | Resource group primitive |
|---|---|
| AWS | Account |
| Azure | Resource group |
| GCP | Project |
These primitives define the default boundary for access policies, billing, and resource naming. Your environment architecture must explicitly map logical environments to these cloud primitives.
Common (but problematic) approach: Multiple environments in one resource group.
This happens because creating new accounts/projects involves heavyweight approval processes. The result: shared access policies, shared resource naming, complex in-group segregation with tags - all of which are less reliable than simply using separate groups.
Recommended approaches:
| Model | Structure | Best for |
|---|---|---|
| One group per environment | dev account / test account / prod account | Most organisations |
| Multiple groups per environment | app account + management account + monitoring account = one production environment | Regulated industries requiring hard segregation between workloads, delivery pipelines, and observability |
The second model is particularly powerful for governance: the application account has no access to the management or monitoring accounts, so a compromised workload cannot disable its own monitoring.
Application Runtime Platforms
Section titled “Application Runtime Platforms”Application runtime platforms determine where and how workloads execute. Three compute models intersect with IaC:
Server Clusters
Section titled “Server Clusters”Traditional clusters consist of identically configured servers running the same workloads. Modern IaC-managed clusters use event-driven scaling - the platform automatically adds or removes servers based on load metrics and health checks, with software delivered via pull deployments or baked images rather than push-scripting tools.
Application Clusters (Container Orchestration)
Section titled “Application Clusters (Container Orchestration)”Application clusters are pools of servers where individual application instances are dynamically scheduled and replaced. Kubernetes dominates this space.
Two provisioning models:
| Model | Examples | Notes |
|---|---|---|
| Cluster as a Service | Amazon EKS, Azure AKS, Google GKE | Managed control plane; integrates naturally with IaC for networking/storage |
| Packaged distributions | Red Hat OpenShift, Rancher RKE, VMware Tanzu | Self-managed; consistent across cloud providers |
Serverless / FaaS
Section titled “Serverless / FaaS”Code executes purely on demand, triggered by events (inbound messages, schedules, IaaS lifecycle events). The platform manages ports, memory, and process lifecycle.
| Details | |
|---|---|
| Advantages | Engineers focus on business logic; highly efficient for unpredictable workloads |
| Challenges | Cold-start latency; limited portability across providers (AWS Lambda ≠ Azure Functions) |
| IaC relevance | Serverless does not eliminate IaC - shared storage, networking, and event infrastructure still needs to be defined as code. Serverless shifts system concerns out of application code into the infrastructure layer |
Cluster Topologies
Section titled “Cluster Topologies”When building container clusters or serverless platforms, four topological models balance governance, optimisation, ownership, and upgrade continuity:
| Topology | Description | Best for | Challenge |
|---|---|---|---|
| Multiple environments, one cluster | QA, Staging, and Production share a single cluster | Small orgs, fully containerised systems | Upgrade coordination grows complex at scale |
| One cluster per environment | Separate cluster for each environment | Governance simplicity; safe cluster upgrades | Ensuring consistency across clusters requires automated IaC delivery |
| Multiple clusters per environment | Each team gets a dedicated cluster within their environment | Strict ownership and workload optimisation | Massive management overhead; risk of resource waste from undersized clusters |
| Cross-environment clusters | Clusters divided by purpose (public services, internal services, data processing) rather than environment | Mixed workloads with distinct resource profiles | These are shared environments - integrated services span cluster boundaries |
Application-Driven Infrastructure Design
Section titled “Application-Driven Infrastructure Design”Infrastructure design should start with the workloads, not the other way around. The workflow:
- Identify workloads - break the system into separately deployable services
- Map required capabilities - describe what each service needs functionally (networking, messaging, storage, compute) without specifying the technology
- Determine implementations - match abstract capabilities to specific services
| Capability | Implementation options |
|---|---|
| Networking | IaaS VPC, subnets, firewall rules |
| Container cluster | GKE, EKS, self-managed Kubernetes |
| Async messaging | Cloud Pub/Sub, SQS, RabbitMQ |
| AI/ML | SaaS API (external vendor) |
| Search | Self-hosted open-source tool |
This design flow prevents the common failure mode of designing infrastructure first and then trying to make the workloads fit.
Cloud Native Software
Section titled “Cloud Native Software”“Cloud native” has multiple overlapping definitions in the industry:
| Source | Definition |
|---|---|
| Practical definition | Software designed and implemented to leverage cloud platform capabilities, built to adapt to shifting needs in capacity, availability, and locality |
| Industry shorthand | Containerised workloads running on Kubernetes |
| CNCF | Systems deployed at scale in a programmatic and repeatable manner; characterised as loosely coupled, secure, resilient, manageable, and observable |
For teams building applications to run on cloud infrastructure, the Twelve-Factor App methodology provides a concrete set of design principles to ensure portability and operability across cloud environments.